# **Nobuko Yoshida (Ed.)**

# **Programming Languages and Systems**

**30th European Symposium on Programming, ESOP 2021 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021 Luxembourg City, Luxembourg, March 27 – April 1, 2021 Proceedings**

# Lecture Notes in Computer Science 12648

Founding Editors

Gerhard Goos, Germany Juris Hartmanis, USA

# Editorial Board Members

Elisa Bertino, USA Wen Gao, China Bernhard Steffen , Germany Gerhard Woeginger , Germany Moti Yung, USA

# Advanced Research in Computing and Software Science Subline of Lecture Notes in Computer Science

Subline Series Editors

Giorgio Ausiello, University of Rome 'La Sapienza', Italy Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board

Susanne Albers, TU Munich, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen , University of Dortmund, Germany Deng Xiaotie, Peking University, Beijing, China Jeannette M. Wing, Microsoft Research, Redmond, WA, USA More information about this subseries at http://www.springer.com/series/7407

Nobuko Yoshida (Ed.)

# Programming Languages and Systems

30th European Symposium on Programming, ESOP 2021 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021 Luxembourg City, Luxembourg, March 27 – April 1, 2021 Proceedings

Editor Nobuko Yoshida Imperial College London, UK

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-72018-6 ISBN 978-3-030-72019-3 (eBook) https://doi.org/10.1007/978-3-030-72019-3

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© The Editor(s) (if applicable) and The Author(s) 2021. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

# ETAPS Foreword

Welcome to the 24th ETAPS! ETAPS 2021 was originally planned to take place in Luxembourg in its beautiful capital Luxembourg City. Because of the Covid-19 pandemic, this was changed to an online event.

ETAPS 2021 was the 24th instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference established in 1998, and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each conference has its own Program Committee (PC) and its own Steering Committee (SC). The conferences cover various aspects of software systems, ranging from theoretical computer science to foundations of programming languages, analysis tools, and formal approaches to software engineering. Organising these conferences in a coherent, highly synchronised conference programme enables researchers to participate in an exciting event, having the possibility to meet many colleagues working in different directions in the field, and to easily attend talks of different conferences. On the weekend before the main conference, numerous satellite workshops take place that attract many researchers from all over the globe.

ETAPS 2021 received 260 submissions in total, 115 of which were accepted, yielding an overall acceptance rate of 44.2%. I thank all the authors for their interest in ETAPS, all the reviewers for their reviewing efforts, the PC members for their contributions, and in particular the PC (co-)chairs for their hard work in running this entire intensive process. Last but not least, my congratulations to all authors of the accepted papers!

ETAPS 2021 featured the unifying invited speakers Scott Smolka (Stony Brook University) and Jane Hillston (University of Edinburgh) and the conference-specific invited speakers Işil Dillig (University of Texas at Austin) for ESOP and Willem Visser (Stellenbosch University) for FASE. Inivited tutorials were provided by Erika Ábrahám (RWTH Aachen University) on analysis of hybrid systems and Madhusudan Parthasararathy (University of Illinois at Urbana-Champaign) on combining machine learning and formal methods.

ETAPS 2021 was originally supposed to take place in Luxembourg City, Luxembourg organized by the SnT - Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg. University of Luxembourg was founded in 2003. The university is one of the best and most international young universities with 6,700 students from 129 countries and 1,331 academics from all over the globe. The local organisation team consisted of Peter Y.A. Ryan (general chair), Peter B. Roenne (organisation chair), Joaquin Garcia-Alfaro (workshop chair), Magali Martin (event manager), David Mestel (publicity chair), and Alfredo Rial (local proceedings chair).

ETAPS 2021 was further supported by the following associations and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer Science), EAPLS (European Association for Programming Languages and Systems), and EASST (European Association of Software Science and Technology).

The ETAPS Steering Committee consists of an Executive Board, and representatives of the individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and EASST. The Executive Board consists of Holger Hermanns (Saarbrücken), Marieke Huisman (Twente, chair), Jan Kofron (Prague), Barbara König (Duisburg), Gerald Lüttgen (Bamberg), Caterina Urban (INRIA), Tarmo Uustalu (Reykjavik and Tallinn), and Lenore Zuck (Chicago).

Other members of the steering committee are: Patricia Bouyer (Paris), Einar Broch Johnsen (Oslo), Dana Fisman (Be'er Sheva), Jan-Friso Groote (Eindhoven), Esther Guerra (Madrid), Reiko Heckel (Leicester), Joost-Pieter Katoen (Aachen and Twente), Stefan Kiefer (Oxford), Fabrice Kordon (Paris), Jan Křetínský (Munich), Kim G. Larsen (Aalborg), Tiziana Margaria (Limerick), Andrew M. Pitts (Cambridge), Grigore Roșu (Illinois), Peter Ryan (Luxembourg), Don Sannella (Edinburgh), Lutz Schröder (Erlangen), Ilya Sergey (Singapore), Mariëlle Stoelinga (Twente), Gabriele Taentzer (Marburg), Christine Tasson (Paris), Peter Thiemann (Freiburg), Jan Vitek (Prague), Anton Wijs (Eindhoven), Manuel Wimmer (Linz), and Nobuko Yoshida (London).

I'd like to take this opportunity to thank all the authors, attendees, organizers of the satellite workshops, and Springer-Verlag GmbH for their support. I hope you all enjoyed ETAPS 2021.

Finally, a big thanks to Peter, Peter, Magali and their local organisation team for all their enormous efforts to make ETAPS a fantastic online event. I hope there will be a next opportunity to host ETAPS in Luxembourg.

February 2021

Marieke Huisman ETAPS SC Chair ETAPS e.V. President

# Preface

Welcome to the 30th European Symposium on Programming! ESOP 2021 was originally planned to take place in Luxembourg. Because of the COVID-19 pandemic, this was changed to an online event. ESOP is one of the European Joint Conferences on Theory and Practice of Software (ETAPS). It is devoted to fundamental issues in the specification, design, analysis, and implementation of programming languages and systems.

This volume contains 24 papers, which the program committee selected among 79 submissions. Each submission received between three and five reviews. After an author response period, the papers were discussed electronically among the 25 PC members and 98 external reviewers. The nine papers for which the PC chair had a conflict of interest (11% of the total submissions) were kindly handled by Patrick Eugster.

The quality of the submissions for ESOP 2021 was astonishing, and very sadly, we had to reject many strong papers. I would like to thank all the authors who submitted their papers to ESOP 2021.

Finally, I truly thank the members of the program committee. I am very impressed by their insightful and constructive reviews – every PC member has contributed very actively to the online discussions under this difficult COVID-19 situation, and supported Patrick and me. It was a real pleasure to work with all of you! I am also grateful to the nearly 100 external reviewers, who provided their expert opinions.

I would like to thank the ESOP 2020 chair Peter Müller for his instant help and guidance on many occasions. I thank all who contributed to the organisation of ESOP– the ESOP steering committee and its chair Peter Thiemann as well as the ETAPS steering committee and its chair Marieke Huisman, who provided help and guidance. I would also like to thank Alfredo Rial Duran, Barbara Könich, and Francisco Ferreira for their help with the proceedings.

January 2021 Nobuko Yoshida

# Organization

# Program Committee

Stephanie Balzer CMU Viviana Bono Università di Torino Brijesh Dongol University of Surrey Marco Gaboardi Boston University Zhenjiang Hu Peking University Hongjin Liang Nanjing University Yu David Liu SUNY Binghamton Alan Schmitt Inria Zhong Shao Yale University Sam Staton University of Oxford Vasco T. Vasconcelos University of Lisbon Tobias Wrigstad Uppsala University Damien Zufferey MPI-SWS

Sandrine Blazy University of Rennes 1 - IRISA Patrick Eugster Università della Svizzera italiana (USI) Dan Ghica University of Birmingham Justin Hsu University of Wisconsin-Madison Robbert Krebbers Radboud University Nijmegen Étienne Lozes I3S, University of Nice & CNRS Corina Pasareanu CMU/NASA Ames Research Center Alex Potanin Victoria University of Wellington Guido Salvaneschi University of St. Gallen Taro Sekiyama National Institute of Informatics Alexander J. Summers University of British Columbia Nicolas Wu Imperial College London Nobuko Yoshida Imperial College London

# Additional Reviewers

Adamek, Jiri Alglave, Jade Álvarez Picallo, Mario Ambal, Guillaume Amtoft, Torben Ancona, Davide Atig, Mohamed Faouzi Avanzini, Martin Bengtson, Jesper

Besson, Frédéric Bodin, Martin Canino, Anthony Casal, Filipe Castegren, Elias Castellan, Simon Chakraborty, Soham Charguéraud, Arthur Chen, Liqian

Chen, Yixuan Chini, Peter Chuprikov, Pavel Cogumbreiro, Tiago Curzi, Gianluca Dagnino, Francesco Dal Lago, Ugo Damiani, Ferruccio Derakhshan, Farzaneh Dexter, Philip Dezani-Ciancaglini, Mariangiola Emoto, Kento Fernandez, Kiko Fromherz, Aymeric Frumin, Daniil Gavazzo, Francesco Gordillo, Pablo Gratzer, Daniel Guéneau, Armaël Iosif, Radu Jacobs, Jules Jiang, Hanru Jiang, Yanyan Jongmans, Sung-Shik Jovanović, Dejan Kaminski, Benjamin Lucien Kerjean, Marie Khayam, Adam Kokologiannakis, Michalis Krishna, Siddharth Laird, James Laporte, Vincent Lemay, Mark Lindley, Sam Long, Yuheng Mamouras, Konstantinos Mangipudi, Shamiek

Maranget, Luc Martínez, Guido Mehrotra, Puneet Miné, Antoine Mordido, Andreia Muroya, Koko Murray, Toby Møgelberg, Rasmus Ejlers New, Max Noizet, Louis Noller, Yannic Novotný, Petr Oliveira Vale, Arthur Orchard, Dominic Padovani, Luca Pagani, Michele Parthasarathy, Gaurav Paviotti, Marco Power, John Poças, Diogo Pérez, Jorge A. Qu, Weihao Rand, Robert Rouvoet, Arjen Sammler, Michael Sato, Tetsuya Sterling, Jonathan Stutz, Felix Matthias Sutre, Grégoire Swamy, Nikhil Takisaka, Toru Toninho, Bernardo Toro, Matias Vene, Varmo Viering, Malte Wang, Di Zufferey, Damien

# Contents



# **The Decidability of Verification under PS 2.0**

Parosh Aziz Abdulla1, Mohamed Faouzi Atig(-)1, Adwait Godbole2, S. Krishna2, and Viktor Vafeiadis<sup>3</sup>

> <sup>1</sup> Uppsala University, Uppsala, Sweden {parosh,mohamed faouzi.atig}@it.uu.se <sup>2</sup> IIT Bombay, Mumbai, India {adwaitg,krishnas}@cse.iitb.ac.in <sup>3</sup> MPI-SWS, Kaiserslautern, Germany viktor@mpi-sws.org

**Abstract.** We consider the reachability problem for finite-state multithreaded programs under the promising semantics (PS 2.0) of Lee et al., which captures most common program transformations. Since reachability is already known to be undecidable in the fragment of PS 2.0 with only release-acquire accesses (PS 2.0-ra), we consider the fragment with only relaxed accesses and promises (PS 2.0-rlx). We show that reachability under PS 2.0-rlx is undecidable in general and that it becomes decidable, albeit non-primitive recursive, if we bound the number of promises. Given these results, we consider a bounded version of the reachability problem. To this end, we bound both the number of promises and of "view-switches", i.e., the number of times the processes may switch their local views of the global memory. We provide a code-to-code translation from an input program under PS 2.0 (with relaxed and release-acquire memory accesses along with promises) to a program under SC, thereby reducing the bounded reachability problem under PS 2.0 to the bounded context-switching problem under SC. We have implemented a tool and tested it on a set of benchmarks, demonstrating that typical bugs in programs can be found with a small bound.

**Keywords:** Model-Checking · Memory Models · Promising Semantics

# **1 Introduction**

An important long-standing open problem in PL research has been to define a weak memory model that captures the semantics of concurrent memory accesses in languages like Java and C/C++. A model is considered good if it can be implemented efficiently (i.e., if it supports all usual compiler optimizations and its accesses are compiled to plain x86/ARM/Power/RISCV accesses), and is easy to reason about. To address this problem, Kang et al. [16] introduced the promising semantics. This was the first model that supported basic invariant reasoning, the DRF guarantee, and even a non-trivial program logic [30].

In the promising semantics, the memory is modeled as a set of timestamped messages, each corresponding to a write made by the program. Each process/thread records its own view of the memory—i.e., the latest timestamp for

c The Author(s) 2021 N. Yoshida (Ed.): ESOP 2021, LNCS 12648, pp. 1–29, 2021. https://doi.org/10.1007/978-3-030-72019-3 1

each memory location that it is aware of. A message has the form (x, v,(f, t], V ) where x is a location, v a value to be stored for x, (f, t] is the timestamp interval corresponding to the write and V is the local view of the process who made the write to x. When reading from memory, a process can either return the value stored at the timestamp in its view or advance its view to some larger timestamp and read from that message. When a process p writes to memory location x, a new message with a timestamp larger than p's view of x is created, and p's view is advanced to include the new message. In addition, in order to allow load-store reorderings, a process is allowed to promise a certain write in the future. A promise is also added as a message in the memory, except that the local view of the process is not updated using the timestamp interval in the message. This is done only when the promise is eventually fulfilled. A consistency check is used to ensure that every promised message can be certified (i.e., made fulfillable) by executing that process on its own. Furthermore, this should hold from any future memory (i.e., from any extension of the memory with additional messages). The quantification prevents deadlocks (i.e., processes from making promises they are not able to fulfil). However, the unbounded number of future memories, that need to be checked, makes the verification of even simple programs practically infeasible. Moreover, a number of transformations based on global value range analysis as well as register promotion were not supported in [16].

To address these concerns, Lee et al. developed a new version of the promising semantics, PS 2.0 [22] PS 2.0 simplifies the consistency check and instead of checking the promise fulfilment from all future memories, PS 2.0 checks for promise fulfilment only from a specially crafted extension of the current memory called capped memory. PS 2.0 also introduces the notion of reservations, which allows a process to secure a timestamp interval in order to perform a future atomic read-modify-write instruction. The reservation blocks any other message from using that timestamp interval. Because of these changes, PS 2.0 supports register promotion and global value range analysis, while capturing all features (process local optimizations, DRF guarantees, hardware mappings) of the original promising semantics. Although PS 2.0 can be considered a semantic breakthough, it is a very complex model: it supports two memory access modes, relaxed (rlx) and release-acquire (ra), along with promises, reservations and certifications.

Let PS 2.0-rlx (resp. PS 2.0-ra) be the fragment of PS 2.0 allowing only relaxed (rlx) (resp. release-acquire (ra)) memory accesses. A natural and fundamental question to investigate is the verification of concurrent programs under PS 2.0. Consider the reachability problem, i.e., whether a given configuration of a concurrent finite-state program is reachable. Reachability with only ra accesses has already been shown to be undecidable [1], even without promises and reservations. That leaves us only the PS 2.0-rlx fragment, which captures the semantics of concurrent 'relaxed' memory accesses in programming languages such as Java and C/C++. We show that if an unbounded number of promises is allowed, the reachability problem under PS 2.0-rlx is undecidable. Undecidability is obtained with an execution with only 2 processes and 3 context switches, where a context is a computation segment in which only one process is active.

Then, we show that reachability under PS 2.0-rlx becomes decidable if we bound the number of promises at any time (however, the total number of promises made within a run can be unbounded). The proof introduces a new memory model with higher order words LoHoW, which we show equivalent to PS 2.0-rlx in terms of reachable states. Under the bounded promises assumption, we use the decidability of the coverability problem of well structured transition systems (WSTS) [7,13] to show that the reachability problem for LoHoW with bounded number of promises is decidable. Further, PS 2.0-rlx without promises and reservations has a non-primitive recursive lower bound. Our decidability result covers the relaxed fragment of the RC11 model [20,16] (which matches the PS 2.0-rlx fragment with no promises). Given the high complexity for PS 2.0-rlx and the undecidability of PS 2.0-ra, we next consider a bounded version of the reachability problem. To this end, we propose a parametric under-approximation in the spirit of context bounding [9,33,21,26,24,29,1,3]. The aim of context bounding is to restrict the otherwise unbounded interaction between processes, and has been shown experimentally in the case of SC programs to maintain enough behaviour coverage for bug detection [24,29]. The concept of context bounding has been extended for weak memory models. For instance, for RA, Abdula et al. [1] proposed view bounding using the notion of view-switching messages and a translation that keeps track of the causality between different variables. Since PS 2.0 subsumes RA, we propose a bounding notion that extends view bounding.

Using our new bounding notion, we propose a source-to-source translation from programs under PS 2.0 to context-bounded executions of the transformed program under SC. The challenges in our translation differ a lot from that in [1], as we have to provide a procedure that (i) handles different memory accesses rlx and ra, (ii) guesses the promises and reservations in a non-deterministic manner, and (iii) verifies that promises are fulfilled using the capped memory.

We have implemented this reduction in a tool, PS2SC. Our experimental results demonstrate the effectiveness of our approach. We exhibit cases where hard-to-find bugs are detectable using a small view-bound. Our tool displays resilience to trivial changes in the position of bugs and the order of processes. Further, in our code-to-code translation, the mechanism for making and certifying promises and reservations is isolated in one module, and can easily be changed to cover different variants of the promising semantics.

For lack of space, detailed proofs can be found in [5].

# **2 Preliminaries**

In this section, we introduce the notation that will be used throughout.

**Notations.** Given two natural numbers i, j <sup>∈</sup> <sup>N</sup> s.t. <sup>i</sup> <sup>≤</sup> <sup>j</sup>, we use [i, j] to denote {k |i ≤ k ≤ j}. Let A and B be two sets. We use f : A → B to denote that f is a function from A to B. We define f[a → b] to be the function f s.t. f (a) = b and f (a ) = f(a ) for all a = a. For a binary relation R, we use [R] <sup>∗</sup> to denote its reflexive and transitive closure. Given an alphabet Σ, we use Σ<sup>∗</sup> (resp. Σ<sup>+</sup>) to denote the set of possibly empty (resp. non-empty) finite words (also called

simple words) over Σ. A higher order word over Σ is an element of (Σ∗)<sup>∗</sup> (i.e., word of words). Let w = a1a<sup>2</sup> ··· a<sup>n</sup> be a simple word over Σ, we use |w| to denote the length of w. Given an index i in [1, |w|], we use w[i] to denote the i th letter of w. Given two indices i and j s.t. 1 ≤ i ≤ j ≤ |w|, we use w[i, j] to denote the word aia<sup>i</sup>+1 ··· a<sup>j</sup> . Sometimes, we see a word as a function from [1, |w|] to Σ.

**Program Syntax.** The simple programming language we use is described in Figure 1. A program Prog consists of a set Loc of (global) variables or memory locations, and a set P of processes. Each process p declares a set Reg (p) of (local) registers followed by a sequence of labeled instructions. We assume that these sets of registers are disjoint and we use Reg := ∪pReg (p) to denote their union. We assume also a (potentially unbounded)


```
Fig. 1: Syntax of programs.
```
data domain Val from which the registers and locations take values. All locations and registers are assumed to be initialized with the special value 0 ∈ Val (if not mentioned otherwise). An instruction i is of the form λ : s where λ is a unique label and s is a statement. We use L<sup>p</sup> to denote the set of all labels of the process p, and L = - <sup>p</sup>∈P <sup>L</sup><sup>p</sup> the set of all labels of all processes. We assume that the execution of the process p starts always with a unique initial instruction labeled by λ<sup>p</sup> init.

A write instruction is of the form x<sup>o</sup> = \$r assigns the value of register \$r to the location x, and o denotes the access mode. If o = rlx, the write is a relaxed write, while if o = ra, it is a release write. A read instruction \$r = x<sup>o</sup> reads the value of the location x into the local register \$r. Again, if the access mode o = rlx, it is a relaxed read, and if o = ra, it is an acquire read. Atomic updates or RMW instructions are either compare-and-swap (**CAS**or,o<sup>w</sup> ) or **FADD**or,o<sup>w</sup> . Both have a pair of accesses (or, o<sup>w</sup> ∈ {rel, acq,rlx}) to the same location – a read followed by a write. Following [22], **FADD**(x, v) stores the value of x into a register \$r, and adds v to x, while **CAS**(x, v1, v2) compares an expected value v<sup>1</sup> to the value in x, and if the values are same, sets the value of x to v2. The old value of x is then stored in \$r. A local assignment instruction \$r = e assigns to the register \$r the value of e, where e is an expression over a set of operators, constants as well as the contents of the registers of the current process, but not referring to the set of locations. The fence instruction SC-fence is used to enforce sequential consistency if it is placed between two memory access operations. For simplicity, we will write assume(x = e) instead of \$r = x; assume(\$r = e). This notation is extended in the straightforward manner to conditional statements.

# **3 The Promising Semantics**

In this section, we recall the promising semantics [22]. We present here PS 2.0 with three memory accesses, relaxed, release writes (rel) and acquire reads (acq). Read-modify-writes (RMW) instructions have two access modes - one for read and one for write. We keep aside the release and acquire fences (and subsequent access modes), since they do not affect the results of this paper.

**Timestamps.** PS 2.0 uses timestamps to maintain a total order over all the writes to the same variable. We assume an infinite set of timestamps Time, densely totally ordered by ≤, with 0 being the minimum element. A view is a timestamp function V : Loc → Time that records the largest known timestamp for each location. Let T be the set containing all the timestamp functions, along with the special symbol ⊥. Let Vinit represent the initial view where all locations are mapped to 0. Given two views V and V , we use V ≤ V to denote that V (x) ≤ V (x) for x ∈ Loc. The merge operation between two views V and V returns the pointwise maximum of V and V , i.e., (V V )(y) is the maximum of V (y) and V (y). Let I denote the set of all intervals over Time. The timestamp intervals in I have the form (f, t] where either f = t = 0 or f<t, with f, t ∈ Time. Given an interval I = (f, t] ∈ I, I.frm and I.to denote f, t respectively.

**Memory.** In PS 2.0, the memory is modelled as a set of concrete messages (which we just call messages), and reservations. Each message represents the effect of a write or a RMW operation and each reservation is a timestamp interval reserved for future use. In more detail, a message m is a tuple (x, v,(f, t], V ) where <sup>x</sup> <sup>∈</sup> Loc, <sup>v</sup> <sup>∈</sup> Val, (f, t] ∈ I and <sup>V</sup> <sup>∈</sup> <sup>T</sup>. A reservation <sup>r</sup> is a tuple (x,(f, t]). Note that a reservation, unlike a message, does not commit to any particular value. We use m.loc (r.loc), m.val, m.to (r.to), m.frm (r.frm) and m.View to denote respectively x, v, t, f and V . Two elements (either messages or reservations) are said to be disjoint (m1#m2) if they concern different variables (m1.loc = m2.loc) or their intervals do not overlap (m1.to ≤ m2.frm∨m1.frm ≥ m2.to). Two sets of elements M,M are disjoint, denoted M#M , if m#m for every m ∈ M,m ∈ M . Two elements m1, m<sup>2</sup> are adjacent denoted Adj(m1, m2) if m1.loc = m2.loc and m1.to = m2.frm. A memory M is a set of pairwise disjoint messages and reservations. Let M be the subset of <sup>M</sup> containing only messages (no reservations). For a location x, let M(x) be {m ∈ M | m.loc = x}. Given a view V and a memory M, we say V ∈ M if V (x) = m.to for some message m ∈ M for every <sup>x</sup> <sup>∈</sup> Loc. Let <sup>M</sup> denote the set of all memories.

**Insertion into Memory.** Following [22], a memory M can be extended with a message (due to the execution of a write/RMW instruction) or a reservation m with m.loc = x, m.frm = f and m.to = t in a number of ways:

Additive insertion M <sup>A</sup> ← m is defined only if (1) M#{m}; (2) if m is a message, then no message m ∈ M has m .loc = x and m .frm = t; and (3) if m is a reservation, then there exists a message m ∈ M with <sup>m</sup> .loc = x and m .to = f. The extended memory M <sup>A</sup> ← m is then M ∪ {m}.

Splitting insertion M <sup>S</sup> ← m is defined if m is a message, and, if there exists a message m = (x, v ,(f, t ], V ) with t<t in M. Then M is updated to M <sup>S</sup> ← m = (M\{m }∪{m,(x, v ,(t, t ], V )}).

Lowering Insertion M <sup>L</sup> ← m is only defined if there exists m in M that is identical to m = (x, v,(f, t], V ) except for m.View ≤ m .View. Then, M is updated to M <sup>L</sup> ← m = M\{m }∪{m}.

**Transition System of a Process.** Given a process p ∈ P, a state σ of p is defined by a pair (λ, R) where <sup>λ</sup> <sup>∈</sup> <sup>L</sup> is the label of the next instruction to be executed by p and R : Reg → Val maps each register of p to its current value. (Observe that we use the set of all labels L (resp. registers Reg) instead of L<sup>p</sup> (resp. Reg (p)) in the definition of σ just for the sake of simplicity.) Transitions between the states of p are of the form (λ, R) <sup>t</sup> <sup>=</sup>⇒<sup>p</sup> (λ , R ) with t is on one of the following forms: , rd(o, x, v), wt(o, x, v), U(or, ow, x, vr, vw), and SC-fence. A transition of the form (λ, R) rd(o,x,v) =====⇒<sup>p</sup> (λ , R ) denotes the execution of a read instruction of the form \$r = x<sup>o</sup> labeled by λ where (1) λ is the label of the next instructions that can be executed after the instruction labelled by λ, and (2) R is the mapping that results from updating the value of the register \$r in R to v. The transition relation (λ, R) <sup>t</sup> <sup>=</sup>⇒<sup>p</sup> (λ , R ) is defined in similar manner for the other cases of t where wt(o, x, v) stands for a write instruction that writes the value v to x, U(or, ow, x, vr, vw) stands for a RMW that reads the value v<sup>r</sup> from x and write v<sup>w</sup> to it, SC-fence stands for a SC-fence instruction, and stands for the execution of the other local instructions. Observe that o, or, o<sup>w</sup> are the access modes which can be rlx or ra. We use ra for both release and acquire. Finally, we use (λ, R) <sup>t</sup> −→<sup>p</sup> (λ , R ), with t = , to denote that

$$(\lambda, R) \stackrel{\epsilon}{\Rightarrow} \sigma\_1 \stackrel{\epsilon}{\Rightarrow} \dots \stackrel{\epsilon}{\Rightarrow} \sigma\_n \stackrel{t}{\Rightarrow} \sigma\_{n+1} \stackrel{\epsilon}{\Rightarrow} \dots \stackrel{\epsilon}{\Rightarrow} (\lambda', R').$$

**Machine States.** A machine state MS is a tuple ((J, R), VS, PS,M,G), where <sup>J</sup> : P → <sup>L</sup> maps each process <sup>p</sup> to the label of the next instruction to be executed, <sup>R</sup> : Reg <sup>→</sup> Val maps each register to its current value, VS <sup>=</sup> P → <sup>T</sup> is the process view map, which maps each process to a view, <sup>M</sup> is a memory and P S : P → <sup>M</sup> maps each process to a set of messages (called promise set), and <sup>G</sup> <sup>∈</sup> <sup>T</sup> is the global view (that will be used by SC fences). We use C to denote the set of all machine states. Given a machine state MS = ((J, R), VS, PS,M,G) and a process p, let MS↓p denote (σ, VS(p), PS(p),M,G), with σ = (J(p), R(p)), (i.e., the projection of the machine state to the process p). We call MS↓p the process configuration. We use C<sup>p</sup> to denote the set of all process configurations.

The initial machine state MSinit = ((Jinit, Rinit), VSinit, PSinit, Minit, Ginit) is one where: (1) Jinit(p) is the label of the initial instruction of p; (2) Rinit(\$r)=0 for every \$r ∈ Reg; (3) for each p, VS(p) = Vinit as the initial view (that maps each location to the timestamp 0); (4) for each p, the set of promises PSinit(p) is empty; (5) the initial memory Minit contains exactly one initial message (x, 0,(0, 0], Vinit) per location x; and (6) the initial global view maps each location to 0.

**Transition Relation.** We first describe the transition (σ, V, P, M, G) −→<sup>p</sup> (σ , V , P , M , G ) between process configurations in C<sup>p</sup> from which we induce the transition relation between machine states.

Fig. 2: A subset of PS 2.0 inference rules at the process level.

Process Relation. The formal definition of −→<sup>p</sup> is given in Figure 2. Below, we explain these inference rules. Note that the full set of rules can be found in [5]. Read A process p can read from M by observing a message m = (x, v,(f, t], K) if V (x) ≤ t (i.e., p must not be aware of a later message for x). In case of a relaxed read rd(rlx, x, v), the process view of x is updated to t, while for an acquire read rd(ra, x, v), the process view is updated to V [x → t] K. The global memory M, the set of promises P, and the global view G remain the same.

Write. A process can add a fresh message to the memory (MEMORY : NEW) or fulfil an outstanding promise (MEMORY : FULFILL). The execution of a write (wt(rlx, x, v)) results in a message m with location x along with a timestamp interval (−, t]. Then, the process view for x is updated to t. In case of a release write (wt(ra, x, v)) the updated process view is also attached to m, and ensures that the process does not have an outstanding promise on x. (MEMORY : FULFILL) allows to split a promise interval or lower its view before fulfilment.

Update. When a process performs a RMW, it first reads a message m = (x, v,(f, t], K) and then writes an update message with frm timestamp equal to t; that is, a message of the form m = (x, v ,(t, t ], K ). This forbids any other write to be placed between m and m . The access modes of the reads and writes in the update follow what has been described for the read and write above.

Promise, Reservation and Cancellation. A process can non-deterministically promise future writes which are not release writes. This is done by adding a message m to the memory M s.t. m#M and to the set of promises P. Later, a relaxed write instruction can fulfil an existing promise. Recall that the execution of a release write requires that the set of promises to be empty and thus it can not be used to fulfil a promise. In the reserve step, the process reserves a timestamp interval to be used for a later RMW instruction reading from a certain message without fixing the value it will write. A reservation is added both to the memory and the promise set. The process can drop the reservation from both sets using the cancel step in non-deterministic manner.

SC fences. The process view V is merged with the global view G, resulting in V G as the updated process view and global view.

**Machine Relation.** We are ready now to define the induced transition relation between machine states. For machine states MS = ((J, R), V S, P S, M, G) and MS = ((J , R ),VS ,PS , M , G ), we write MS −→<sup>p</sup> MS iff (1) MS↓<sup>p</sup> −→<sup>p</sup> MS↓p and (J(p ),VS(p ),PS(p )) = (J (p ),VS (p ),PS (p )) for all p = p.

**Consistency.** According to Lee et al. [22], there is one final requirement on machine states called consistency, which roughly states that, from every encountered machine state, all the messages promised by a process p can be certified (i.e., made fulfillable) by executing p on its own from a certain future memory (called capped memory), i.e., extension of the memory with additional reservation. Before defining consistency, we need to introduce capped memory.

Cap View, Cap Message and Capped Memory. The last element of a memory M with respect to a location x, denoted by mM,x, is an element from M(x) with the highest timestamp among all elements of M(x) and is defined as <sup>m</sup>M,x <sup>=</sup> max<sup>m</sup>∈M(x) m.to. The cap view of a memory <sup>M</sup>, denoted by <sup>V</sup><sup>M</sup>, is the view which assigns to each location <sup>x</sup>, the to timestamp in the message <sup>m</sup>M ,x - . That is, <sup>V</sup><sup>M</sup> <sup>=</sup> λx.mM ,x - .to. Recall that M denote the subset of <sup>M</sup> containing only messages (no reservations). The cap message of a memory M with respect to a location <sup>x</sup>, is given by <sup>m</sup> M,x = (x, <sup>m</sup>M ,x -.val,(mM,x.to, <sup>m</sup>M,x.to + 1], <sup>V</sup><sup>M</sup>).

Then, the capped memory of a memory M, wrt. a set of promises P, denoted by M <sup>P</sup> , is an extension of <sup>M</sup>, defined as: (1) for every <sup>m</sup>1, m<sup>2</sup> <sup>∈</sup> <sup>M</sup> with m1.loc = m2.loc, m1.to < m2.frm, and there is no message m ∈ M(m1.loc) such that m1.to < m .to < m2.to, we include a reservation (m1.loc,(m1.to, m2.frm]) in M <sup>P</sup> , and (2) we include a cap message <sup>m</sup> M,x in <sup>M</sup> <sup>P</sup> for every variable <sup>x</sup> unless mM,x is a reservation in P.

Consistency. A machine state MS = ((J, R), V S, P S, M, G) is consistent if every process p can certify/fulfil all its promises from the capped memory M P S(p), i.e., ((J, R), V S, P S, M P S(p), G) [−→<sup>p</sup> ] <sup>∗</sup> ((J , R ),VS ,PS , M , G ) with P S (p) = ∅.

**The Reachability Problem in PS 2.0.** A run of Prog is a sequence of the form: MS<sup>0</sup> [−−→<sup>p</sup>i<sup>1</sup> ] <sup>∗</sup>MS<sup>1</sup> [−−→<sup>p</sup>i<sup>2</sup> ] <sup>∗</sup>MS<sup>2</sup> [−−→<sup>p</sup>i<sup>3</sup> ] <sup>∗</sup> ...[−−→<sup>p</sup>in ] <sup>∗</sup>MS<sup>n</sup> where MS<sup>0</sup> = MSinit is the initial machine state and MS1,...,MS<sup>n</sup> are consistent machine states. Then, MS0,...,MS<sup>n</sup> are said to be reachable from MSinit.

Given an instruction label function <sup>J</sup> : P → <sup>L</sup> that maps each process <sup>p</sup> ∈ P to an instruction label in Lp, the reachability problem asks whether there exists a machine state of the form ((J, R), V, P,M, G) that is reachable from MSinit. A positive answer to this problem means that J is reachable in Prog in PS 2.0.

# **4 Undecidability of Consistent Reachability in PS 2.0**

The reachability problem is undecidable for PS 2.0 even for finite-state programs. The proof is by a reduction from Post's Correspondence Problem (PCP) [28]. A PCP instance consists of two sequences u1,...,u<sup>n</sup> and v1,...,v<sup>n</sup> of non-empty words over some alphabet Σ. Checking whether there exists a sequence of indices j1,...,j<sup>k</sup> ∈ {1,...,n} s.t. u<sup>j</sup><sup>1</sup> ...u<sup>j</sup><sup>k</sup> = v<sup>j</sup><sup>1</sup> ...v<sup>j</sup><sup>k</sup> is undecidable. Our proof works with the fragment of PS 2.0 having only relaxed (rlx) memory accesses and crucially uses unboundedly many promises to ensure that a process cannot skip any writes made by another process. We construct a concurrent program with two processes p<sup>1</sup> and p<sup>2</sup> over a finite data domain. The code of p<sup>1</sup> is split into two modes: a generation mode and a validation mode by a if and its else branch. The if branch is entered when the value of a boolean location validate is 0 (its initial value). We show that reaching the instructions annotated by // and // in p1, p<sup>2</sup> is possible iff the PCP instance has a solution. We give below an overview of the execution steps leading to the annotated instructions.


are fulfilled if and only if this sequence is same as the promised sequence u<sup>i</sup><sup>1</sup> ...u<sup>i</sup><sup>k</sup> . This happens only when i1,...,i<sup>k</sup> is a PCP solution.

**–** At the end of promise fulfilment, p<sup>1</sup> reaches //.

Our undecidability result is also tight in the sense that the reachability problem becomes decidable when we restrict ourselves to machine states where the number of promises is bounded. Further, our proof is robust: it goes through for PS 1.0 [16]. Let us call the fragment of PS 2.0 with only rlx memory accesses PS 2.0-rlx.

**Theorem 1.** The reachability problem for concurrent programs over a finite data domain is undecidable under PS 2.0-rlx.

# **5 Decidable Fragments of PS 2.0**

Since keeping ra memory accesses renders the reachability problem undecidable [1] and so does having unboundedly many promises when having rlx memory accesses (Theorem 1), we address in this section the decidability problem for PS 2.0-rlx with a bounded number of promises in any reachable configuration. Bounding the number of promises in any reachable machine state does not imply that the total number of promises made during that run is bounded. Let bdPS 2.0-rlx represent the restriction of PS 2.0-rlx to boundedly many promises where the number of promises in each reachable machine state is smaller or equal to a given constant. Notice that the fragment bdPS 2.0-rlx subsumes the relaxed fragment of the RC11 model [20,16].We assume here a finite data domain.

To establish the decidability of the reachability of bdPS 2.0-rlx, we introduce an alternate memory model for concurrent programs called LoHoW (for "lossy higher order words"). We present the operational semantics of LoHoW, and show that (1) PS 2.0-rlx is reachability equivalent to LoHoW, (2) under the bounded promise assumption, reachability is decidable in LoHoW (hence, bdPS 2.0-rlx).

**Introduction to** LoHoW. Given a concurrent program Prog, a state of LoHoW maintains a collection of higher order words, one per location of Prog, along with the states of all processes. The higher order word HW<sup>x</sup> corresponding to the location x is a word of simple words, representing the sub memory M(x) in PS 2.0-rlx. Each simple word in HW<sup>x</sup> is an ordered sequence of "memory types", that is, messages or promises in M(x), maintained in the order of their to timestamps in the memory. The word order between memory types in HW<sup>x</sup> represents the order induced by time stamps between memory types in M(x). The key information to encode in each memory type of HW<sup>x</sup> is: (1) is it a message (msg) or a promise (prm) in M(x), (2) the process (p) which added it to M(x), the value (val) it holds, (3) the set S (called pointer set) of processes that have seen this memory type in M(x) and (4) whether the adjacent time interval to the right of this memory type in M(x) has been reserved by some process.

**Memory Types.** To keep track of (1-4) above, a memory type is an element of Σ ∪Γ with, Σ = {msg, prm} ×Val×P×2<sup>P</sup> (for 1-3) and Γ = {msg, prm} ×Val× P × 2<sup>P</sup> × P (for 4). We write a memory type as (r, v, p, S, ?). Here r represents

either msg (message) or prm (promise) in M(x), v is the value, p is the process that added the message/promise, S is a pointer set of processes whose local view (on x) agrees with the to timestamp of the message/promise. If the type ∈ Γ, the fifth component (?) is the process id that has reserved the time slot right-adjacent to the message/promise. ? is a wildcard that may (or not) be matched.

**Simple Words.** A simple word ∈ Σ∗#(Σ ∪ Γ), and each HW<sup>x</sup> is a word <sup>∈</sup> (Σ∗#(<sup>Σ</sup> <sup>∪</sup> <sup>Γ</sup>))<sup>+</sup>. # is a special symbol not in <sup>Σ</sup> <sup>∪</sup> <sup>Γ</sup>, which separates the last symbol from the rest of the simple word. Consecutive symbols of Σ in a simple word in HW<sup>x</sup> represent adjacent messages/promises in M(x) and are hence unavailable for a RMW. # does not correspond to any element from the memory, and is used to demarcate the last symbol of the simple word.

Fig. 3: A higher order word HW (black) with four embedded simple words (pink).

**Higher order words**. A higher order word is a sequence of simple words. Figure 3 depicts a higher order word with four simple words. We use a left to right order in both simple words and higher order words. Furthermore, we extend in the straightforward manner the classical word indexation strategy to higher order words. For example, the symbol at the third position of the higher order word HW in Figure 3 is HW[3] = (msg, 2, p, {p, q}). A higher order word HW is well-formed iff for every p ∈ P, there is a unique position i in HW having p in its pointer set; that is, HW[i] is of the form (−, −, −, S, ?) ∈ Σ ∪ Γ s.t. p ∈ S. The higher order word given in Figure 3 is well-formed. We will use ptr(p, HW) to denote the unique position i in HW having p in its pointer set. We assume that all the manipulated higher order words are well-formed. g

Fig. 4: Map from memories M(x), M(y) to higher order words HWx, HWy.

Each higher order word HW<sup>x</sup> represents the entire space [0, ∞) of available timestamps in M(x). Each simple word in HW<sup>x</sup> represents a timestamp interval (f, t], while consecutive simple words represent disjoint timestamp intervals (while preserving order). The memory types constituting each simple word take up adjacent timestamp intervals, spanning the timestamp interval of the simple word. The adjacency of timestamp intervals within simple words is used in RMW steps and reservations. The last symbol in a simple word denotes a message/promise which, (1) if in Σ, is available for a RMW, while (2) if in Γ, is unavailable for RMW since it is followed by a reservation. Symbols at positions other than the rightmost in a simple word, represent messages/promises which are not

available for RMW. Figure 4 presents a mapping from a memory of PS 2.0-rlx to a collection of higher order words (one per location) in LoHoW.

**Initializing higher order words.** For each location x ∈ Loc, the initial higher order word HWinit <sup>x</sup> is defined as , where P is the set of all processes and <sup>p</sup><sup>1</sup> is some process in <sup>P</sup>. The set of all higher order words HWinit <sup>x</sup> for all locations x represents the initial memory of PS 2.0-rlx where all locations have value 0, and all processes are aware of the initial message.

**Simulating PS 2.0 Memory Operations in** LoHoW**.** In the following, we describe how to handle PS 2.0-rlx instructions in LoHoW. Since we only have the rlx mode, we denote Reads, Writes and RMWs as wt(x, v), rd(x, v) and U(x, vr, vw), dropping the modes.

Reads. To simulate a rd(x, v) by a process p in LoHoW, we need an index j ≥ ptr(p, HWx) in HW<sup>x</sup> such that HWx[j] is a memory type with value v of the form (−, v, −, S , ?) (? denotes that the type is either from Σ or Γ). The read is simulated by adding p to the set S and removing it from its previous set.

Fig. 5: Transformation of HW<sup>x</sup> on a read. (? denotes that type is from Σ or Γ)

Writes. A wt(x, v) by a process p (writing v to x) is simulated by adding a new msg type in HW<sup>x</sup> with a timestamp higher than the view of p for x: (1) add the simple word (msg, v, p, {p}) to the right of ptr(p, HWx) or (2) there is α ∈ Σ such that the word w#α is in HW<sup>x</sup> to the right of ptr(p, HWx). Modify w#α to get wα#(msg, v, p, {p})·. Remove p from its previous pointer set.

$$\begin{array}{c} \begin{array}{l} \mathsf{old} = \mathsf{ptr}(p, \mathsf{W} \mathsf{W}\_{x})\\ \stackrel{\mathsf{\textstyle\rightarrow}}{\longrightarrow} \quad \overline{\left(\begin{array}{l} \Box,\,\,\stackrel{\scriptstyle\rightarrow}{},\,\,\stackrel{\scriptstyle\rightarrow}{},\,\,\stackrel{\scriptstyle\rightarrow}{}\,\,\,\stackrel{\scriptstyle\rightarrow}{}\right)} \cdots\\ \stackrel{\mathsf{\textstyle\rightarrow}}{\longrightarrow} \end{array} \end{array} \begin{array}{l} \mathsf{wt}(x,v) \\ \stackrel{\textstyle\rightarrow}{\longrightarrow} \quad \overline{\left(\begin{array}{l} \Box,\,\,\stackrel{\scriptstyle\rightarrow}{},\,\,\stackrel{\scriptstyle\rightarrow}{}\,\,\,\stackrel{\scriptstyle\rightarrow}{}\,\,\,\stackrel{\scriptstyle\rightarrow}{}\right)} \cdots\\ \stackrel{\mathsf{\textstyle\rightarrow}}{\longrightarrow} \quad \overline{\left(\begin{array}{l} \Box,\,\,\stackrel{\scriptstyle\rightarrow}{},\,\,\stackrel{\scriptstyle\rightarrow}{}\right)} \cdots\\ \stackrel{\mathsf{\textstyle\rightarrow}}{\longrightarrow} \end{array} \end{array} \begin{array}{l} \mathsf{old} \\ \stackrel{\scriptstyle\rightarrow}{\longrightarrow} \quad \overline{\left(\begin{array}{l} \Box,\,\,\stackrel{\scriptstyle\rightarrow}{},\,\,\stackrel{\scriptstyle\rightarrow}{}\right)} \cdots\\ \stackrel{\mathsf{\textstyle\rightarrow}}{\longrightarrow} \quad \overline{\left(\begin{array}{l} \Box,\,\,\stackrel{\scriptstyle\rightarrow}{}\right)} \cdots\\ \stackrel{\mathsf{\textstyle\rightarrow}}{\longrightarrow} \end{array} \begin{array}{l} \mathsf{wt}(x,v) \\ \stackrel{\textstyle\rightarrow}{\longrightarrow} \quad \overline{\left(\begin{array}{l} \Box,\,\,\stackrel{\scriptstyle\rightarrow}{}\right)} \cdots\\ \stackrel{\mathsf{\textstyle\rightarrow}}{\longrightarrow} \quad \overline{\left(\begin{array}{l} \Box,\,\,\stackrel{\scriptstyle\rightarrow}{}\right)} \cdots\\ \mathsf{old} \end{array} \end{array} \end{array} \cdots$$

Fig. 6: Transformation of HW<sup>x</sup> on a write. (? denotes that type is from Σ or Γ).

RMWs. Capturing RMWs is similar to the execution of a read followed by a write. In PS 2.0-rlx, a process p performing an RMW, reads from a message with a timestamp interval (, t] and adds a message to M(x) with timestamp interval (t, −]. Capturing RMWs needs higher order words. Consider a U(x, vr, vw) step by process p. Then, there is a simple word in HW<sup>x</sup> having (−, vr, −, S) as the last memory type whose position is to the right of ptr(p, HWx). As usual, p is removed from its pointer set, #(−, vr, −, S) is replaced with (−, vr, −, S\{p})# and (−, vw, p, {p}) is appended, resulting in extending to .

Promises, Reservations and Cancellations. Handling promises made by a process p in PS 2.0-rlx is similar to handling wt(x, v): we add the simple word in HW<sup>x</sup> to the right of the position ptr(p, HWx), or append (prm, v, p, {}) at the end of a simple word with a position larger than ptr(p, HWx). The memory type has tag prm (a promise), and the pointer set is empty (since making a promise does not lift the view of the promising process). Splitting the time interval of a promise is simulated in LoHoW by inserting a new memory type right before the corresponding promise memory type (prm, −, p, S), while fulfilment of a promise by a process p results in replacing (prm, v, p, S) with (msg, v, p, S ∪ {p}).

In PS 2.0-rlx, a process p makes a reservation by adding the pair (x,(f, t]) to the memory, given that there is a message/promise in the memory with timestamp interval (−, f]. In LoHoW this is captured by "tagging" the rightmost memory type (message/promise) in a simple word with the name of the process that makes the reservation. This requires us to consider the memory types from Γ = {msg, prm} × Val ×P× 2<sup>P</sup> × P where the last component stores the process which made the reservation. Such a memory type always appears at the end of a simple word, and represents that the next timestamp interval adjacent to it has been reserved. Observe that nothing can be added to the right of a memory type of the form (msg, v, p, S, q). Thus, reservations are handled as follows.


**Certification** In PS 2.0-rlx, certification for a process p happens from the capped memory, where intermediate time slots (other than reserved ones) are blocked, and any new message can be added only at the maximal timestamp. This is handled in LoHoW by one of the following: (1) Addition of new memory types is allowed only at the right end of any HWx, or (2) If the rightmost memory type in HW<sup>x</sup> is of form (−, v, −, −, q) with q = p (a reservation by q), then the word #(msg, v, q, {}) is appended at end of HWx.

Memory is altered in PS 2.0-rlx during certification phase to check for promise fulfilment, and at the end of the certification phase, we resume from the memory which was there before. To capture this in LoHoW, we work on a duplicate of (HWx)<sup>x</sup>∈Loc in the certification phase. Notice that the duplication allows losing non-deterministically, empty memory types: these are memory types whose pointer set is empty, as well as redundant simple words, which are simple words consisting entirely of empty memory types. This copy of HW<sup>x</sup> is then modified during certification, and is discarded once we finish the certification phase.

### **5.1 Formal Model of LoHoW**

In the following, we formally define LoHoW and state the equivalence of the reachability problem in PS 2.0-rlx and LoHoW. For a memory type m = (r, v, p, S) (or m = (r, v, p, S, q)), we use m.value to denote v. For a memory type (r, v, p, S, ?) and a process p ∈ P, we define the following: add(m, p ) ≡ (r, v, p, S ∪ {p }, ?) and del(m, p ) ≡ (r, v, p, S \ {p }, ?). This corresponds to the addition/deletion of the process p to/from the set of pointers of m. Extending the above notation, given a higher order word HW, a position i ∈ {1,..., |HW|}, and p ∈ P , we define the following: add(HW, p, i) ≡ HW[1, i−1] · add(HW[i], p)· HW[i+ 1, |HW|], add(HW, p, i) ≡ HW[1, i−1]·add(HW[i], p)·HW[i+1, |HW|], and mov(HW, p, i) ≡ add(del(HW, p), p, i). This corresponds to the addition/deletion/relocation of the pointer p to/from the word HW[i].

**Insertion into higher order words.** A higher order word HW can be extended in position 1 ≤ j ≤ |HW| with a memory type m = (r, v, p, {p}) as follows:

• Insertion as a new simple word is defined only if HW[j − 1] = # (i.e., the position j is the end of a simple word). Let HW = del(HW, p) (i.e., removing p from its previous set of pointers). Then, the insertion of m results in

$$\mathsf{HM} \xleftarrow{N}\_{j} m \equiv \mathsf{HM}'[1, j] \cdot \underbrace{\#(r, v, p, \{p\})}\_{\text{new simple word}} \cdot \mathsf{HM}'[j+1, |\mathsf{HM}|].$$

• Insertion at the end of a simple word is defined only if HW[j − 1] = # and HW[j] ∈ Σ (i.e., the last memory type in the simple word should be free from reservations). Let HW = del(HW, p). For HW = w<sup>1</sup> ·#m ·w2, and |w<sup>1</sup> ·#m | = j the insertion of m results in

$$\mathsf{HM} \xleftarrow{E}\_{j} m \equiv w\_{1} \cdot m' \cdot \underbrace{\#(r, v, p, \{p\})}\_{m \text{ extends } m'} \cdot w\_{2}$$

• Splitting a promise is defined only if m = HW[j] has form (prm, −, p, −, ?) (i.e., the memory type at position j is a promise). Let HW = del(HW, p). Then,

$$\mathsf{HM}\xleftarrow{SP}\_{j}\stackrel{SP}{\underset{j}{\longleftrightarrow}}\;{}{m}\equiv\begin{cases}\mathsf{HM}'[1,j-2]\cdot\underbrace{\left(r,v,p,\{p\}\right)\cdot\#m'}\_{\underset{m}{\longleftrightarrow}\;{\text{split}lim}\;{m'}}\cdot\mathsf{HM}'[j+1,|\!{\mathsf{HM}}\!|] & \text{if }\mathsf{HM}'[j-1]=\#j\\\mathsf{HN}'[1,j-1]\cdot\underbrace{\left(r,v,p,\{p\}\right)\cdot m'}\_{\underset{m}{\longleftrightarrow}\;{\text{split}lim}\;{m'}\;{m'}}\cdot\mathsf{HM}'[j+1,|\!{\mathsf{HM}}\!|] & \text{if }\mathsf{HM}'[j-1]\neq\#j\end{cases}$$

Observe that in both cases we insert the new type m just before position j. • Fulfilment of a promise is defined only if m = HW[j] is of the form (prm, v, p, S) or (prm, v, p, S, q). Let HW = del(HW, p). Then, the extended higher order

$$\mathsf{HM} \stackrel{FP}{\underset{j}{\longleftrightarrow}} m \equiv \mathsf{HM}'[1, j-1] \cdot \underbrace{(\mathsf{msg}, v, p, S \cup \{p\}, \underline{2})}\_{m' \text{ is fulfilled by } p} \cdot \mathsf{HM}'[j+1, |\mathsf{HM}'|] $$

where ? is q if m = (prm, v, p, S, q) ∈ Γ and is omitted if m = (prm, v, p, S) ∈ Σ.

**Making/Canceling a reservation.** A higher order word HW can also be modified by p by making/cancelling a reservation at a position 1 ≤ j ≤ |HW|. We define the operation M ake(HW, p, j) (Cancel(HW, p, j)) that reserves (cancels) a time slot at j. M ake(HW, p, j) (resp. Cancel(HW, p, j)) is only defined if HW[j] is of the form (r, v, q, S) (resp. (r, v, q, S, p)) and HW[j − 1] = #. Then, we have M ake(HW, p, j) ≡ HW[1, j − 1] · (r, v, q, S, p) · HW[j + 1, |HW|] and Cancel(HW, p, j) ≡ HW[1, j − 1] · (r, v, q, S) · HW[j + 1, |HW|].

**Process configuration in** LoHoW**.** A configuration of p ∈ P in LoHoW consists of a pair (σ, **HW**) where (1) σ is the process state maintaining the instruction label and the register values (see Subsection 3), and **HW** is a mapping from the set of locations to higher order words. The transition relations std −−→<sup>p</sup> and cert −−−→<sup>p</sup> between process configurations are given in Figure 7; the transition relation cert −−−→<sup>p</sup> is used only in the certification phase while std −−→<sup>p</sup> is used to simulate the standard phase of PS 2.0-rlx. A read operation in both phases (standard and certification) is handled by reading a value from a memory type which is on the right of the current pointer of p. A write operation, in the standard phase, can result in the insertion, on the right of the current pointer of p, of a new memory type at the end of a simple word or as a new simple word. The memory type resulting from a write in the certification phase is only allowed to be inserted at the end of the higher order word or at the reserved slots (using the rule splitting a reservation). Write can also be used to fulfil a promise or to split a promise (i.e., partial fulfilment) during the both phases. Making/canceling a reservation will result in tagging/untagging a memory type at the end of a simple word on the right of the current pointer of p. The case of RMW is similar to a read followed by a write operations (whose resulting memory type should be inserted to the right of the read memory type). Finally, a promise can only be made during the standard phase and the resulting memory type will be inserted at the end of a simple word or as a new word on the right of the current pointer of p.


Fig. 7: A subset of LoHoW inference rules at the process level.

**Losses in** LoHoW**.** Let HW and HW be two higher order words in (Σ∗#(Σ ∪ Γ))<sup>+</sup>. Let us assume that HW = u1#a1u2#a<sup>2</sup> ...uk#a<sup>k</sup> and HW = v1#b1v2#b<sup>2</sup> ...vm#bm, with ui, v<sup>i</sup> ∈ Σ<sup>∗</sup> and ai, b<sup>j</sup> ∈ Σ ∪ Γ. We extend the

subword relation to higher order word as follows: HW HW iff there is a strictly increasing function f : {1,...,k}→{1,...,m} s.t. (1) u<sup>i</sup> v<sup>f</sup>(i) for all 1 ≤ i ≤ k, (2) a<sup>i</sup> = b<sup>f</sup>(i), and (3) we have the same number of memory types of the form (prm, −, −, −) or (prm, −, −, −, −) in HW and HW . The relation corresponds to the loss of some special empty memory types and redundant simple words (as explained earlier). The relation is extended to mapping from locations to higher order words as follows: **HW HW** iff **HW**(x) **HW** (x) for all x ∈ Loc.

LoHoW **states.** <sup>A</sup> LoHoW state st is a tuple ((J, <sup>R</sup>), **HW**) where <sup>J</sup> : P → <sup>L</sup> maps each process p to the label of the next instruction to be executed, R : Reg → Val maps each register to its current value, and **HW** is a mapping from locations to higher order words. The initial LoHoW state stinit is defined as ((Jinit, Rinit), **HW**init) where: (1) Jinit(p) is the label of the initial instruction of p; (2) <sup>R</sup>init(\$r) = 0 for every \$<sup>r</sup> <sup>∈</sup> Reg; and (3) **HW**init(x) = HWinit <sup>x</sup> for all x ∈ Loc.

For two LoHoW states st = ((J, R), **HW**) and st = ((J , R ), **HW** ) and <sup>a</sup> ∈ {std, cert}, we write st <sup>a</sup> −→<sup>p</sup> st iff one of the following cases holds: (1) ((J(p), R), **HW**) <sup>a</sup> −→<sup>p</sup> ((J (p), R ), **HW** ) and J(p ) = J (p ) for all p = p, or (2) (J, R)=(J , R ) and **HW HW** .

**Two phases** LoHoW **states**. A two-phases state of LoHoW is S = (π, p, ststd, stcert) where π ∈ {cert, std} is a flag describing whether the LoHoW is in "standard" phase or "certification" phase, p is the process which evolves in one of these phases, while ststd, stcert are two LoHoW states (one for each phase). When the LoHoW is in the standard phase, then ststd evolves, and when the LoHoW is in certification phase, stcert evolves. A two-phases LoHoW state is said to be initial if it is of the form (std, p, stinit, stinit), where p ∈ P is any process. The transition relation → between two-phases LoHoW states is defined as follows: Given S = (π, p, ststd, stcert) and S = (π , p , st std, st cert), we have S→S iff one of the following cases holds:


**– From the certification phase to standard phase.** π = cert, π = std, ststd = st std, stcert = st cert, and stcert is of the form ((J, R), **HW**) with **HW**(x) does not contain any memory type of form (prm, −, p, −, ?) for all x ∈ Loc (i.e., all promises made by p are fulfilled).

**The Reachability Problem in** LoHoW. Given an instruction label function <sup>J</sup> : P → <sup>L</sup> that maps each <sup>p</sup> ∈ P to a label in <sup>L</sup>p, the reachability problem in LoHoW asks whether there exists a two phases LoHoW state S of the form (std, −,((J, R), **HW**),((J , R ), **HW** )) s.t. (1) **HW**(x) and **HW** (x) do not contain any memory type of the form (prm, −, p, −, ?) for all x ∈ Loc, and (2) S is reachable in LoHoW (i.e., S<sup>0</sup> [−→] <sup>∗</sup> S where S<sup>0</sup> is an initial two-phases LoHoW states). A positive answer to this problem means J is reachable in Prog in LoHoW. The following theorem states the equivalence between LoHoW and PS 2.0-rlx in terms of reachable instruction label functions.

**Theorem 2.** An instruction label function J is reachable in a program Prog in LoHoW iff J is reachable in Prog in PS 2.0-rlx.

### **5.2 Decidability of LoHoW with Bounded Promises**

The equivalence of the reachability in LoHoW and PS 2.0-rlx, coupled with Theorem 1 shows that reachability is undecidable in LoHoW. To recover decidability, we look at LoHoW with only bounded number of the promise memory type in any higher order word. Let K-LoHoW denote LoHoW with a number of promises bounded by K. (Observe that K-LoHoW corresponds to bdPS 2.0-rlx.)

**Theorem 3.** The reachability problem is decidable for K-LoHoW.

As a corollary of Theorem 3, the decidability of reachability follows for bdPS 2.0-rlx. The proof makes use of the framework of Well-Structured Transition Systems (WSTS) [7,13]. Next, we state that the reachability problem for K-LoHoW (even for K = 0) is highly non-trivial (i.e., non-primitive recursive). The proof is done by reduction from the reachability problem for lossy channel systems, in a similar to the case of TSO [8] where we insert SC-fence instructions everywhere in the process that simulates the lossy channel process (in order to ensure that no promises can be made by that process).

**Theorem 4.** The reachability problem for K-LoHoW is non-primitive recursive.

# **6 Source to Source Translation**

In this section, we propose an algorithmic approach for state reachability in concurrent programs under PS 2.0. We first recall the notion of view altering reads [1], and that of bounded contexts in SC [29].

View Altering Reads. A read from the memory is view altering if it changes the view of the process reading it. This means that the view in the message being

read from was greater than the process view on some variable. The message which is read from in turn is called a view altering message. A run in which the total number of view altering reads (across all threads) is bounded (by some parameter) is called a view-bounded run. The underapproximate analysis for PS 2.0-ra without promises and reservations [1] considered view bounded runs. Essential Events. An essential event in a run ρ of a program under PS 2.0 is either a promise, a reservation or a view altering read by some process in the run. Bounded Context. A context is an uninterrupted sequence of actions by a single process. In a run having K contexts, the execution switches from one process to another K − 1 times. A K bounded context run is one where the number of context switches are bounded by <sup>K</sup> <sup>∈</sup> <sup>N</sup>. The <sup>K</sup> bounded context reachability problem in SC checks for the existence of a K bounded context run reaching some chosen instruction. Now we define the notion of bounding for PS 2.0.

**The Bounded Consistent Reachability Problem**. A run ρ of a concurrent program under PS 2.0, MS<sup>0</sup> [−−→<sup>p</sup>i<sup>1</sup> ] <sup>∗</sup> MS<sup>1</sup> [−−→<sup>p</sup>i<sup>2</sup> ] <sup>∗</sup> MS<sup>2</sup> [−−→<sup>p</sup>i<sup>3</sup> ] <sup>∗</sup> ... [−−→<sup>p</sup>in ] <sup>∗</sup> MS<sup>n</sup> is called K bounded iff the number of essential events in ρ is ≤ K. The K bounded reachability problem for PS 2.0 checks for the existence of a run ρ of Prog which is K-bounded. Assuming Prog has n processes, we propose an algorithm that reduces the K bounded reachability problem to a K + n bounded context reachability problem of a program -Prog under SC.

**Translation Overview**. We now provide a brief overview of the data structures and procedures utilized in our translation; the full details and correctness are in [5]. Let Prog be a concurrent program under PS 2.0 with set of processes P and locations Loc. Our algorithm relies on a source to source translation of Prog to a bounded context SC program -Prog, as shown in Figure 8 and operates on the same data domain (need not be finite). The translation (i) adds a new process (Main) that initializes the global variables of -Prog, (2) for each process <sup>p</sup> ∈ P adds local variables, which are initialized by the function InitProc.

Fig. 8: Source-to-source translation map

This is followed by the code block CSOp,λ<sup>0</sup> (Context Switch Out) that optionally enables the process to switch out of context. For each λ labeled

instruction <sup>i</sup> in <sup>p</sup>, the map <sup>λ</sup> : <sup>i</sup><sup>p</sup> transforms it into a sequence of instructions as follows : the code block CSI (Context Switch In) checks if the process is active in the current context; then it transforms each statement s of instruction <sup>i</sup> into a sequence of instructions following the map <sup>s</sup><sup>p</sup>, and finally executes the code block CSOp,λ. CSOp,λ facilitates two things: when the process is at an instruction label λ, (1) allows p to make promises/reservations after λ, s.t. the control is back at λ after certification; (2) it ensures that the machine state is consistent when p switches out of context. Translation of assume, if and while statements keep the same statement. Translation of read and write statements are described later. Translation of RMW statements are omitted for ease of presentation.

The set of promises a process makes has to be constrained with respect to the set of promises that it can certify To address this, in the translation, processes run in two modes : a 'normal' mode and a 'check' (consistency check) mode. In the normal mode, a process does not make any promises or reservations. In the check mode, the process may make promises and reservations and subsequently certify them before switching out of context. In any context, a process first enters the normal mode, and then, before exiting the context it enters the check mode. The check mode is used by the process to (1) make new promises/reservations and (2) certify consistency of the machine state. We also add an optional parameter, called certification depth (certDepth), which constrains the number of steps a process may take in the check mode to certify its promises. Figure 9 shows the structure of a translated run under SC.

Fig. 9: Control flow: In each context, a process runs first in normal mode n and then in consistency check mode cc. The transitions between these modes is

facilitated by the CSO code block of the respective process. We check assertion failures for K + n context-bounded executions (j ≤ K + n).

To reduce the PS 2.0 run into a bounded context SC run, we use the bound on the number of essential events. From the run ρ in PS 2.0, we construct a K bounded run ρ in PS 2.0 where the processes run in the order of generation of essential events. So, the process which generates the first essential event is run first, till that event happens, then the second process which generates the second essential event is run, and so on. This continues till K + n contexts : the K bounds the number of essential events, and the n is to ensure all processes are run to completion. The bound on the number of essential events gives a bound on the number of timestamps that need to be maintained. As observed in [1], each view altering read requires two timestamps; additionally, each promise/reservation requires one timestamp. Since we have K such essential events, 2K time stamps suffice. We choose Time = {0, 1, 2,..., 2K} as the set of timestamps. Now we briefly give a high level overview of the translation.

**Data Structures**. The message data structure represents a message generated as a write or a promise and has 4 fields (i) var , the address of the memory location written to; (ii) the timestamp t in the view associated with the message; (iii) v, the value written; and (iv) flag, that keeps track of whether it is a message or a promise; and, in case of a promise, which process it belongs to. The View data structure stores, for each memory location x, (i) a timestamp t ∈ Time, (ii) a value v written to x, (iii) a Boolean l ∈ {true, false} representing whether t is an exact timestamp (which can be used for essential events) or an abstract timestamp (which corresponds to non-essential events).

**Global Variables**. The Memory is an array of size K holding elements of type message . This array is populated with the view altering messages, promises and reservations generated by the program. We maintain counters for (1) the number of elements in Memory ; (2) the number of context switches that have occurred; and (3) the number of essential events that have occurred.

**Local Variables**. In addition to its local registers, each process has local variables including (1) a local variable view which stores a local instance of the view function (this is of type View), (2) a flag denoting whether the process is running in the current context, and (3) a flag checkMode denoting whether the process is in the certification phase. We implement the certification phase as a function call, and hence store the process state and return address, while entering it.

# **6.1 Translation Maps**

In what follows we illustrate how the translation simulates a run under PS 2.0. At the outset, recall that each process alternates, in its execution, between two modes: a normal mode (n in Figure 9) at the beginning of each context and the check mode at the end of the current context (cc in Figure 9), where it may make new promises and certify them before switching out of context.

**Context Switch Out (**CSOp,λ**).** We describe the CSO module; Algorithm 1 of Figure 10 provides its pseudocode. CSOp,λ is placed after each instruction λ in the original program and serves as an entry and exit point for the consistency check phase of the process. When in normal mode (n) after some instruction λ, CSO non-deterministically guesses whether the process should exit the context at this point, and sets the checkMode flag to true and subsequently, saves its local state and the return address (to mark where to resume execution from, in the next context). The process then continues its execution in the consistency check mode (cc) from the current instruction label (λ) itself. Now the process may generate new promises (see Algorithm 1 of Figure 10) and certify these as well as earlier made promises. In order to conclude the check mode phase, the process will enter the CSO block at some different instruction label λ . Now since the checkMode flag is true, the process enters the else branch, verifies that there are no outstanding promises of p to be certified. Since the promises are not yet fulfilled, when p switches out of context, it has to mark all its promises uncertified. When the context is back to p again, this will be used to fulfil the promises or to certify them again before the context switches out of p again. Then it exits the check mode phase, setting checkMode to false. Finally it loads the saved state, and returns to the instruction label λ (where it entered check mode) and exits the context. Another process may now resume execution.


Fig. 10: Algorithms for CSO and Write

**Write Statements**. The translation of a write instruction <sup>x</sup> := \$r<sup>o</sup>, where o ∈ {rlx, ra} of a process p is given in Algorithm 2 of Figure 10. This is the general pseudo code for both kinds of memory accesses, with specific details pertaining to the particular access mode omitted. Let us first consider execution in the normal mode (i.e., checkMode is false). First, the process updates its local state with the value that it will write. Then, the process non-deterministically chooses one of three possibilities for the write, it either (i) does not assign a fresh timestamp (non-essential event), (ii) assigns a fresh timestamp and adds it to memory, or (iii) fulfils some outstanding promise.

Let us now consider a write executing when checkMode is true, and highlight differences with the normal mode. In case (i), non essential events exclude promises and reservations. Then, while in certification phase, since we use a capped memory, the process can make a write if either (1) the write interval can be generated through splitting insertion or (2) the write can be certified with the help of a reservation. Basically the writes we make either split an existing interval (and add this to the left of a promise), or forms a part of a reservation. Thus, the time stamp of a neighbour is used. In case (ii) when a fresh time stamp is used, the write is made as a promise, and then certified before switching out of context. The analogue of case (iii) is the certification of promises for the current context; promise fulfilment happens only in the normal mode. To help a process decide the value of a promise, we use the fact that CBMC allows us to assign a

non-deterministic value of a variable. On top of that, we have implemented an optimization that checks the set of possible values to be written in the future.

**Read Statements.** The translation of a read instruction -\$<sup>r</sup> := <sup>x</sup><sup>o</sup>, <sup>o</sup> <sup>∈</sup> {rlx, ra} of process p is given in Algorithm 3 of Figure 11.

The process first guesses, whether it will read from a view altering message in the memory of from its local view. If it is the latter, the process must first verify whether it can read from the local view ; for instance, reading from the local view may not be possible after execution of a fence instruction when the timestamp of a variable x gets incremented from the local view t to t > t. In the case of a view altering read, we first check that we have not reached the context switching/essential event bound. Then the new message is fetched from Memory and we check the view (timestamps) in the acquired message satisfy the conditions



Fig. 11: Algorithm for Read

imposed by the access type ∈ {ra, rlx}. Finally, the process updates its view with that of the new message and increments the counters for the context switches and the essential events. Theorem 5 proves the correctness of our translation.

**Theorem 5.** Given a program Prog under PS 2.0, and <sup>K</sup> <sup>∈</sup> <sup>N</sup>, the source to source translation constructs a program prog whose size is polynomial in Prog and K such that, there is a K-bounded run of Prog under PS 2.0 reaching a set of instruction labels, if and only if there is a <sup>K</sup>+n-bounded context run of prog under SC that reaches the same set of instruction labels.

# **7 Implementation and Experimental Results**

In order to check the efficiency of the source-to-source translation, we implement a prototype tool, PS2SC which is the first tool to handle PS 2.0. PS2SC takes as input a C program and a bound K and translates it to a program Prog to be run under SC. We use CBMC v5.10 as the backend verifier for Prog . CBMC takes as input L, the loop unrolling parameter for bounded model checking of Prog . If PS2SC returns unsafe, then the program has an unsafe execution. Conversely, if it returns safe then none of the executions within the subset violate any assertion. K may be iteratively incremented to increase the number of executions explored. PS2SC has a functionality of partial-promises allowing subsets of processes to promise, providing an effective under-approximation technique.

We now report the results of experiments we have performed with PS2SC. We have two objectives: (1) studying the performance of PS2SC on thin-air litmus tests and benchmarks utilizing promises, and (2) comparing PS2SC with other model checkers when operating in the promise-free mode. In the first case we show that PS2SC is able to uncover bugs in litmus tests and examples with few reads and writes to the shared memory. When this interaction and subsequent non-determinism of PS 2.0 increases, we also enable partial promises. For the second case we compare PS2SC with three model checkers CDSChecker [25], GenMC [18] and Rcmc [17] that support the promise-free subset of PS 2.0. Our observations highlight the ability to detect hard to find bugs with small K for unsafe benchmarks. We do not consider compilation time for any tool while reporting the results. For PS2SC, the time reported is the time taken by the CBMC backend for analysis. The timeout used is 1hr for all benchmarks. All experiments are conducted on a machine with 3.00 GHz Intel Core i5-3330 CPU and 8GB RAM running an Ubuntu-16 64-bit operating system. We denote timeout by 'TO', and memory limit exceeded by 'MLE'.

**Benchmarks Utilizing Promises.** In the following, we report the performance of PS2SC on litmus tests and parametrized tests.

Litmus Tests. We test PS2SC on litmus-tests adapted from [16,22,11,23]. These examples are small programs that serve as barebones thin-air tests for the C11 memory model. Consistency tests based on the Java Memory Model are proposed in [23], which were experimented on by [27] with their MRDer tool. Like MRDer, PS2SC is able to verify most of these tests within 1 minute which shows its ability to handle typical programming idioms of PS 2.0 (see Table 1).

Parameterized Tests. In Table 2, we consider unsafe examples adapted from the Fibonaccibased benchmarks of SV-COMP 2019 [10]. In these examples a process is required to generate a promise (speculative write) with value as the i th fibonacci number. This promise is certified using process-local reads. Thus though the parameter i increases the interaction of the promising process with the memory remains constant. The **CAS** variant requires the process to make use of reservations. We note that PS2SC uncovers the bugs effectively in these cases. In cases where promise-certificate requires reads from external processes, the amount of shared-memory


Table 1: Litmus Tests



interaction increases with i. In this case, we use partial promises.

How to recover tractable analysis? We note that though the above example consists of several processes interacting with the memory, the bug can be uncovered even if only a single process is allowed to make promising writes. We run PS2SC in the partial-promises mode. We considered the case where only a

single process generates promises, and PS2SC was able to uncover the bug. The results obtained are in Table 2, where PS2SC[1p] denotes that only one process is permitted to perform promises. We then repeat our experiments on other unsafe benchmarks - including ExponentialBug from Fig. 2 of [15] - and have similar observations. To summarize, we note that the huge non-determinism of PS 2.0 can be fought by using the modular approach of partial-promises.

**Comparing with Other Tools.** In this section, we compare performance of PS2SC in promise-free mode with CDSChecker [25], GenMC [18] and Rcmc [17] (which do not support promises). The main objective of this section is to provide evidence for the practicability of the essential-event-bounding technique. The results of this section indicate that the source-to-source translation with Kessential-event bounding is effective at uncovering hard to find bugs in non-trivial programs. Additionally, we observe that in most examples considered, we had K ≤ 10. We provide here a subset of the experimental results and the remaining in the full version of the paper [5]. In the tables that follow we provide the value of K (for PS2SC) and the value of L (loop-unrolling bound) for all tools.

Parameterized Benchmarks. In Table 3, we experiment on two parametrized benchmarks:


ExponentialBug

Table 3: Parameterized benchmarks

(Fig. 2 of [15]) and Fibonacci (from SV-COMP 2019). In ExponentialBug(N) N is the number writes made to a variable by a process. We note that in ExponentialBug(N) the number of executions grows as N!, while the processes have to follow a specific interleaving to uncover the hard to find bug. In Fibonacci(N), two processes compute the value of the nth fibonacci number in a distributed fashion.

Concurrent data structures based benchmarks. In Table 4, we consider benchmarks based on concurrent data structures. The first of these


### Table 4: Concurrent data structures

is a concurrent locking algorithm originating from [14]. The second, LinuxLocks(N) is adapted from evaluations of CDSChecker [25]. We note that if not completely fenced, it is unsafe. We fence all but one lock access. Both these results show the ability of our tool to uncover bugs with a small value of K.

Variations of mutual exclusion protocols. We consider variants of mutual exclusion protocols from SV-COMP 2019. The fully fenced versions of the protocols are safe. We modify these protocols by introducing bugs and comparing the performance of PS2SC for bug detection with the other tools. These benchmarks are parameterized by the number of processes. In Table 5, we unfence a single

process of the Peterson and Szymanski protocols making them unsafe. These are benchmarks petersonU(i) and szymanskiU(i) where i is the number of processes.

In petersonB(i), we keep all processes fenced but introduce a bug into the critical section of a process (write a value to a shared variable and read a different value from it). We note that the other tools do not scale, while


Table 5: Mutual exclusion benchmarks with a single unfenced process

PS2SC is able to detect the bug within one minute, showing that essential event-bounding is an effective under-approximation technique for bug-finding.

**Remark.** Through all these experiments, we observe that SMC tools and our tool try to tackle the same problem by using orthogonal approaches to finding bugs. Hence, through the experiments above we are not trying to pitch one approach against the other, but rather trying to highlight the differences in their features. We have exhibited examples where our tool is able to uncover hard-to-find bugs faster than the others with relatively small values of K.

# **8 Related Work and Conclusion**

Most of the existing verification work for C/C++ concurrency models concern the development of stateless model checking coupled with dynamic partial order reduction (e.g., [6,17,18,26,25]) and do not handle the promising semantics. Context-bounding has been proposed in [29] for programs running under SC. This work has been extended in different directions and has led to efficient and scalable techniques for the analysis of concurrent programs (see e.g., [24,21,33,32,12,34]). In the context of weak memory models, context-bounded analyses have been proposed for TSO/PSO [9,31] and POWER [3].

The decidability of the verification problems for programs running under weak memory models has been addressed for TSO [8], RA [1], SRA [19], and POWER [2]. We believe that our proof techniques can be easily adapted to work with different variants of the promising semantics [16] (see [4]). For instance, in the code-to-code translation, the mechanism for making and certifying promises and reservations is isolated in one module, which can be easily changed to cover different variants of the promising semantics. Furthermore, the undecidability proof still goes through for [16]. Moreover, providing a tool for the verification of (among other things) litmus tests, will provide a valuable environment which can be used in further improvements of the promising semantics. To the best of our knowledge, this the first time that this problem is investigated for PS 2.0-rlx and PS2SC is the first tool for automated verification of programs under PS 2.0. Finally, studying the decidability problem for related models that solve the thin-air problem (e.g., Paviotti et al. [27]) is interesting and kept as future work.

# **References**


Languages and Systems - 29th European Symposium on Programming, ESOP 2020, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020, Dublin, Ireland, April 25-30, 2020, Proceedings. Lecture Notes in Computer Science, vol. 12075, pp. 599–625. Springer (2020)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4. 0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Data Flow Analysis of Asynchronous Systems using Infinite Abstract Domains

Snigdha Athaiya(-)1, Raghavan Komondoor1, and K. Narayan Kumar<sup>2</sup>

> <sup>1</sup> Indian Institute of Science, Bengaluru, India {snigdha,raghavan}@iisc.ac.in <sup>2</sup> Chennai Mathematical Institute, Chennai, India kumar@cmi.ac.in

Abstract. Asynchronous message-passing systems are employed frequently to implement distributed mechanisms, protocols, and processes. This paper addresses the problem of precise data flow analysis for such systems. To obtain good precision, data flow analysis needs to somehow skip execution paths that read more messages than the number of messages sent so far in the path, as such paths are infeasible at run time. Existing data flow analysis techniques do elide a subset of such infeasible paths, but have the restriction that they admit only finite abstract analysis domains. In this paper we propose a generalization of these approaches to admit infinite abstract analysis domains, as such domains are commonly used in practice to obtain high precision. We have implemented our approach, and have analyzed its performance on a set of 14 benchmarks. On these benchmarks our tool obtains significantly higher precision compared to a baseline approach that does not elide any infeasible paths and to another baseline that elides infeasible paths but admits only finite abstract domains.

Keywords: Data Flow Analysis · Message-passing systems.

# 1 Introduction

Distributed software that communicates by asynchronous message passing is a very important software paradigm in today's world. It is employed in varied domains, such as distributed protocols and workflows, event-driven systems, and UI-based systems. Popular languages used in this domain include Go (https://golang.org/), Akka (https://akka.io/), and P (https://github.com/p-org).

Analysis and verification of asynchronous systems is an important problem, and poses a rich set of challenges. The research community has focused historically on a variety of approaches to tackle this overall problem, such as model checking and systematic concurrency testing [25,13], formal verification to check properties such as reachability or coverability of states [41,3,2,21,18,31,19,1], and data flow analysis [29].

Data flow analysis [32,30] is a specific type of verification technique that propagates values from an abstract domain while accounting for all paths in a

c The Author(s) 2021

N. Yoshida (Ed.): ESOP 2021, LNCS 12648, pp. 30–58, 2021. https://doi.org/10.1007/978-3-030-72019-3\_2

program. It can hence be used to check whether a property or assertion always holds. The existing verification and data flow analysis approaches mentioned earlier have a major limitation, which is that they admit only finite abstract domains. This, in general, limits the classes of properties that can be successfully verified. On the other hand, data flow analysis of sequential programs using infinite abstract domains, e.g., constant propagation [32], interval analysis [12], and octagons [44], is a well developed area, and is routinely employed in verification settings. In this paper we seek to bridge this fundamental gap, and develop a precise data flow analysis framework for message-passing asynchronous systems that admits infinite abstract domains.

### 1.1 Motivating Example: Leader election

1: *max* := process number; send 1, max 2: Process is in *active* mode 3: while true do 4: if process is in *passive* mode then 5: receive a mesg and send this same mesg 6: else if message 1, i arrives then 7: if i = max then 8: Send message 2, i; *left* := i 9: else 10: Declare *max* as the global maximum 11: nr\_leaders++; assert(nr\_leaders = 1) 12: else if message 2, j arrives then 13: if *left* > j and *left* > *max* then 14: *max* := *left* 15: Send message 1, max 16: else 17: Process enters *passive* mode 2 4 1 3 *<*1*,*4> *<*1,3*>* <2,3> <1,4>,*<2,*4>,<1,2>

Fig. 1. Pseudo-code of each process in leader election, and a partial run

To motivate our work we use a benchmark program<sup>3</sup> in the Promela language [25] that implements a leader election protocol [17]. In the protocol there is a ring of processes, and each process has a unique number. The objective is to discover the "leader", which is the process with the maximum number. The pseudo-code of each process in the protocol is shown in the left side of Figure 1. Each process has its own copy of local variables max and left, whereas nr\_leaders is a global variable that is common to all the processes (its initial value is zero). Each process sends messages to the next process in the ring via an unbounded FIFO channel. Each process becomes "ready" whenever a message is available for it to receive, and at any step of the protocol any one ready process (chosen

<sup>3</sup> file assertion.leader.prm in www.imm.dtu.dk/~albl/promela-models.zip.

non-deterministically) executes one iteration of its "while" loop. (We formalize these execution rules in a more general fashion in Section 2.1.) The messages are a 2-tuple x, i, where x can be 1 or 2, and 1 ≤ i ≤ max . The right side of Figure 1 shows a snapshot at an intermediate point during a run of the protocol. Each dashed arrow between two nodes represents a send of a message and a (completed) receipt of the same message. The block arrow depicts the channel from Process 2 to Process 1, which happens to contain three sent (but still unreceived) messages.

It is notable that in any run of the protocol, Lines 10-11 happen to get executed only by the actual leader process, and that too, exactly once. Hence, the assertion never fails. The argument for this claim is not straightforward, and we refer the reader to the paper [17] for the details.

### 1.2 Challenges in property checking

Data flow analysis could be used to verify the assertion in the example above, e.g., using the Constant Propagation (CP) abstract domain. This analysis determines at each program point whether each variable has a fixed value, and if yes, the value itself, across all runs that reach the point. In the example in Figure 1, all actual runs of the system that happen to reach Line 10 come there with value zero for the global variable nr\_leaders.

A challenge for data flow analysis on message-passing systems is that there may exist infeasible paths in the system. These are paths with more receives of a certain message than the number of copies of this message that have been sent so far. For instance, consider the path that consists of two back-to-back iterations of the "while" loop by the leader process, both times through Lines 3,6,9-11. This path is not feasible, due to the impossibility of having two copies of the message 1, max in the input channel [17]. The second iteration would bring the value 1 for nr\_leaders at Line 10, thus inferring a non-constant value and hence declaring the assertion as failing (which would be a false positive).

Hence, it is imperative in the interest of precision for any data flow analysis or verification approach to track the channel contents as part of the exploration of the state space. Tracking the contents of unbounded channels precisely is known to be undecidable even when solving problems such as reachability and coverability (which are simpler than data flow analysis). Hence, existing approaches either bound the channels (which in general causes unsoundness), or use sound abstractions such as unordered channels (also known as the Petri Net or VASS abstraction) or lossy channels. Such abstractions suffice to elide a subset of infeasible paths. In our running example, the unordered channel abstraction happens to suffice to elide infeasible paths that could contribute to a false positive at the point of the assertion. However, the analysis would need to use an abstract domain such as CP to track the values of integer variables. This is an infinite domain (due to the infinite number of integers). The most closely related previous dataflow analysis approach for distributed systems [29] does use the unordered channel abstraction, but does not admit infinite abstract domains, and hence cannot verify assertions such as the one in the example above.

### 1.3 Our Contributions

This paper is the first one to the best of our knowledge to propose an approach for data flow analysis for asynchronous message-passing systems that (a) admits infinite abstract domains, (b) uses a reasonably precise channel abstraction among the ones known in the literature (namely, the unordered channels abstraction), and (c) computes maximally precise results possible under the selected channel abstraction. Every other approach we are aware of exhibits a strict subset of the three attributes listed above. It is notable that previous approaches do tackle the infinite state space induced by the unbounded channel contents. However, they either do not reason about variable values at all, or only allow variables that are based on finite domains.

Our primary contribution is an approach that we call Backward DFAS. This approach is maximally precise, and admits a class of infinite abstract domains. This class includes well-known examples such as Linear Constant Propagation (LCP) [51] and Affine Relationships Analysis (ARA) [46], but does not include the full (CP) analysis. We also propose another approach, which we call Forward DFAS, which admits a broader class of abstract domains, but is not guaranteed to be maximally precise on all programs.

We describe a prototype implementation of both our approaches. On a set of 14 real benchmarks, which are small but involve many complex idioms and paths, our tool verifies approximately 50% more assertions than our implementation of the baseline approach [29].

The rest of the paper is structured as follows. Section 2 covers the background and notation that will be assumed throughout the paper. We present the Backward DFAS approach in Section 3, and the Forward DFAS approach in Section 4. Section 5 discusses our implementation and evaluation. Section 6 discusses related work, and Section 7 concludes the paper.

# 2 Background and Terminology

Vector addition systems with states or VASS [27] are a popular modelling technique for distributed systems. We begin this section by defining an extension to VASS, which we call a VASS-Control Flow Graph or VCFG.

Definition 1. A VASS-Control Flow Graph or VCFG G is a graph, and is described by the tuple Q, δ, r, q0, V, π, θ, where Q is a finite set of nodes, δ ⊆ Q × Q is a finite set of edges, <sup>r</sup> <sup>∈</sup> <sup>N</sup>, <sup>q</sup><sup>0</sup> is the start node, <sup>V</sup> is a set of variables or memory locations, <sup>π</sup> : <sup>δ</sup> <sup>→</sup> <sup>A</sup> maps each edge to an action, where <sup>A</sup> <sup>≡</sup> ((<sup>V</sup> <sup>→</sup> <sup>Z</sup>) <sup>→</sup> (<sup>V</sup> <sup>→</sup> <sup>Z</sup>)), <sup>θ</sup> : <sup>δ</sup> <sup>→</sup> <sup>Z</sup><sup>r</sup> maps each edge to a vector in <sup>Z</sup><sup>r</sup>.

For any edge e = (q1, q2) ∈ δ, if π(e) = a and θ(e) = w, then a is called the action of e and w is called the queuing vector of e. This edge is depicted as q1 a,w −−→ q2. The variables and the actions are the only additional features of a VCFG over VASS.

<sup>A</sup> configuration of a VCFG is a tuple q, c, ξ, where <sup>q</sup> <sup>∈</sup> <sup>Q</sup>, <sup>c</sup> <sup>∈</sup> <sup>N</sup><sup>r</sup> and <sup>ξ</sup> <sup>∈</sup> (<sup>V</sup> <sup>→</sup> <sup>Z</sup>). The initial configuration of a VCFG is q0, **<sup>0</sup>**, ξ0, where **<sup>0</sup>** denotes a vector with r zeroes, and ξ<sup>0</sup> is a given initial valuation for the variables. The VCFG can be said to have r counters. The vector c in each configuration can be thought of as a valuation to the counters. The transitions between VCFG configurations are according to the rule below:

$$\frac{e = (q\_1, q\_2), \ e \in \delta, \ \pi(e) = a, \ \theta(e) = w, \ a(\xi\_1) = \xi\_2, \ c\_1 + w = c\_2, \ c\_2 \ge \mathbf{0}}{\langle q\_1, c\_1, \xi\_1 \rangle \Rightarrow\_e \langle q\_2, c\_2, \xi\_2 \rangle}$$

### 2.1 Modeling of Asynchronous Message Passing Systems as VCFGs

Asynchronous systems are composed of finite number of independently executing processes that communicate with each other by passing messages along FIFO channels. The processes may have local variables, and there may exist shared (or global) variables as well. For simplicity of presentation we assume all variables are global.

Fig. 2. (a) Asynchronous system with two processes, (b) its VCFG model

Figure 2(a) shows a simple asynchronous system with two processes. In this system there are two channels, c<sup>1</sup> and c2, and a message alphabet consisting of two elements, m<sup>1</sup> and m2. The semantics we assume for message-passing systems is the same as what is used by the tool Spin [25]. A configuration of the system consists of the current control states of all the processes, the contents of all the channels, and the values of all the variables. A single transition of the system consists of a transition of one of the processes from its current control-state to a successor control state, accompanied with the corresponding queuing operation or variable-update action. A transition labeled c!m can be taken unconditionally, and results in 'm' being appended to the tail of the channel 'c'. A transition labeled c?m can be taken only if an instance of 'm' is available at the head

of 'c', and results in this instance getting removed from 'c'. (Note, based on the context, we over-load the term "message" to mean either an element of the message alphabet, or an instance of a message-alphabet element in a channel at run-time.)

Asynchronous systems can be modeled as VCFGs, and our approach performs data flow analysis on VCFGs. We now illustrate how an asynchronous system can be modeled as a VCFG. We assume a fixed number of processes in the system. We do this illustration using the example VCFG in Figure 2(b), which models the system in Figure 2(a). Each node of the VCFG represents a tuple of control-states of the processes, while each edge corresponds to a transition of the system. The action of a VCFG edge is identical to the action that labels the corresponding process transition. ("id" in Figure 2(b) represents the identity action) The VCFG will have as many counters as the number of unique pairs (ci, mj) such that the operation c<sup>i</sup> ! m<sup>j</sup> is performed by any process. If an edge e in the VCFG corresponds to a send transition c<sup>i</sup> ! m<sup>j</sup> of the system, then e's queuing vector would have a +1 for the counter corresponding to (ci, mj) and a zero for all the other counters. Analogously, a receive operation gets modeled as -1 in the queuing vector. In Figure 2(b), the first counter is for (c1,m1) while the second counter is for (c2,m2). Note that the +1 and -1 encoding (which are inherited from VASS's) effectively cause FIFO channels to be treated as unordered channels.

When each process can invoke procedures as part of its execution, such systems can be modeled using inter-procedural VCFGs, or iVCFGs. These are extensions of VCFGs just as standard inter-procedural control-flow graphs are extensions of control-flow graphs. Constructing an iVCFG for a given system is straightforward, under a restriction that at most one of the processes in the system can be executing a procedure other than its main procedure at any time. This restriction is also present in other related work [29,5].

### 2.2 Data flow analysis over iVCFGs

Data flow analysis is based on a given complete lattice L, which serves as the abstract domain. As a pre-requisite step before we can perform our data flow analysis on iVCFGs, we first consider each edge v a,w −−→ w in each procedure in the iVCFG, and replace the (concrete) action a with an abstract action f, where f : L→L is a given abstract transfer function that conservatively over-approximates [12] the behavior of the concrete action a.

Let p be a path in a iVCFG, let p<sup>0</sup> be the first node in the path, and let ξ<sup>i</sup> be a valuation to the variables at the beginning of p. The path p is said to be feasible if, starting from the configuration p0, **0**, ξi, the configuration q, d, ξ obtained at each successive point in the path is such that d ≥ **0**, with successive configurations along the path being generated as per the rule for transitions among VCFG configurations that was given before Section 2.1. For any path p = e<sup>1</sup> e<sup>2</sup> ... e<sup>k</sup> of an iVCFG, we define its path transfer function ptf (p) as f<sup>e</sup><sup>k</sup> ◦ f<sup>e</sup>k−<sup>1</sup> ... ◦ f<sup>e</sup><sup>1</sup> , where f<sup>e</sup> is the abstract action associated with edge e.

Fig. 3. Example iVCFG

The standard data flow analysis problem for sequential programs is to compute the join-over-all-paths (JOP) solution. Our problem statement is to compute the join-over-all-feasible-paths (JOFP) solution for iVCFGs. Formally stated, if start is the entry node of the "main" procedure of the iVCFG, given any node target in any procedure of the iVCFG, and an "entry" value d<sup>0</sup> ∈ L at start such that d<sup>0</sup> conservatively over-approximates ξ0, we wish to compute the JOFP value at target as defined by the following expression:

> p is a feasible and interprocedurally valid path in the iVCFG from start to target (ptf (p))(d0)

Intuitively, due to the unordered channel abstraction, every run of the system corresponds to a feasible path in the iVCFG, but not vice versa. Hence, the JOFP solution above is guaranteed to conservatively over-approximate the JOP solution on the runs of the system (which is not computable in general).

# 3 Backward DFAS Approach

In this section we present our key contribution – the Backward DFAS (Data Flow Analysis of Asynchronous Systems) algorithm – an interprocedural algorithm that computes the precise JOFP at any given node of the iVCFG.

We begin by presenting a running example, which is the iVCFG with two procedures depicted in Figure 3. There is only one channel and one message in the message alphabet in this example, and hence the queuing vectors associated with the edges are of size 1. The edges without the vectors are implicitly associated with zero vectors. The actions associated with edges are represented in the form of assignment statements. The edges without assignment statements next to them have identity actions. The upper part of the Figure 3, consisting of nodes a, b, p, q, h, i, j, k, l, is the VCFG of the "main" procedure. The remaining nodes constitute the VCFG of the (tail) recursive procedure foo. The solid edges are intra-procedural edges, while dashed edges are inter-procedural edges.

Throughout this section we use Linear Constant Propagation (LCP) [51] as our example data flow analysis. LCP, like CP, aims to identify the variables that have constant values at any given location in the system. LCP is based on the same infinite domain as CP; i.e., each abstract domain element is a mapping from variables to (integer) values. The "" relation for the LCP lattice is also defined in the same way as for CP. The encoding of the transfer functions in LCP is as follows. Each edge (resp. path) maps the outgoing value of each variable to either a constant, or to a linear expression in the incoming value of at most one variable into the edge (resp. path), or to a special symbol that indicates an unknown outgoing value. For instance, for the edge g → m in Figure 3, its transfer function can be represented symbolically as (t'=t,x'=x+1,y'=y,z'=z), where the primed versions represent outgoing values and unprimed versions represent incoming values.

Say we wish to compute the JOFP at node k. The only feasible paths that reach node k are the ones that attain calling-depth of three or more in the procedure foo, and hence encounter at least three send operations, which are required to clear the three receive operations encountered from node h to node k. All such paths happen to bring the constant values (t = 1, z = 1) to the node k. Hence, (t = 1, z = 1) is the precise JOFP result at node k. However, infeasible paths, if not elided, can introduce imprecision. For instance, the path that directly goes from node c to node o in the outermost call to the Procedure foo (this path is of calling-depth zero) brings values of zero for all four variables, and would hence prevent the precise fact (t = 1, z = 1) from being inferred.

### 3.1 Assumptions and Definitions

The set of all L→L transfer functions clearly forms a complete lattice based on the following ordering: f<sup>1</sup> f<sup>2</sup> iff for all d ∈ L, f1(d) f2(d). Backward DFAS makes a few assumptions on this lattice of transfer functions. The first is that this lattice be of finite height; i.e., all strictly ascending chains of elements in this lattice are finite (although no a priori bound on the sizes of these chains is required). The second is that a representation of transfer functions is available, as are operators to compose, join, and compare transfer functions. Note, the two assumptions above are also made by the classical "functional" inter-procedural approach of Sharir and Pnueli [55]. Thirdly, we need distributivity, as defined below: for any f1, f2, f ∈L→L, (f<sup>1</sup> f2)◦f = (f1◦f) (f2◦f). The distributivity assumption is required only if the given system contains recursive procedure calls.

Linear Constant Propagation (LCP) [51] and Affine Relationships Analysis (ARA) [46] are well-known examples of analyses based on infinite abstract domains that satisfy all of the assumptions listed above. Note that the CP transferfunctions lattice is not of finite height. Despite the LCP abstract domain being the same as the CP abstract domain, the encoding chosen for LCP transfer

functions (which was mentioned above), ensures that LCP uses a strict, finiteheight subset of the full CP transfer-functions lattice that is closed under join and function composition operations. The trade-off is that LCP transfer functions for assignment statements whose RHS is not a linear expression and for conditionals are less precise than the corresponding CP transfer functions.

Our final assumption is that procedures other than "main" may send messages, but should not have any "receive" operations. Previous approaches that have addressed data flow analysis or verification problems for asynchronous systems with recursive procedures also have the same restriction [54,29,19].

We now introduce important terminology. The demand of a given path p in the VCFG is a vector of size r, and is defined as follows:

$$demand(p) = \begin{cases} \max(\mathbf{0} - w, \mathbf{0}), & \text{if } p = (v \xrightarrow{f, w} z) \\ \max(demand(p') - w, \mathbf{0}), \text{if } p = (e.p'), \text{where } e \equiv (v \xrightarrow{f, w} z) \end{cases}$$

Intuitively, the demand of a path p is the minimum required vector of counter values in any starting configuration at the entry of the path for there to exist a sequence of transitions among configurations that manages to traverse the entire path (following the rule given before Section 2.1). It is easy to see that a path p is feasible iff demand(p) = **0**.

A set of paths C is said to cover a path p iff: (a) all paths in C have the same start and end nodes (respectively) as p, (b) for each p ∈ C, demand(p ) ≤ demand(p), and (c) ( <sup>p</sup>-<sup>∈</sup><sup>C</sup> ptf (p )) ptf (p). (Regarding (b), any binary vector operation in this paper is defined as applying the same operation on every pair of corresponding entries, i.e., point-wise.)

A path template (p<sup>1</sup> , p<sup>2</sup> ,..., pn) of any procedure F<sup>i</sup> is a sequence of paths in the VCFG of F<sup>i</sup> such that: (a) path p<sup>1</sup> begins at the entry node enF<sup>i</sup> of F<sup>i</sup> and path p<sup>n</sup> ends at return node exF<sup>i</sup> of Fi, (b) for all pi, 1 ≤ i<n, p<sup>i</sup> ends at a call-site node, and (c) for all <sup>p</sup>i, <sup>1</sup> < i <sup>≤</sup> <sup>n</sup>, <sup>p</sup><sup>i</sup> begins at a return-site node <sup>v</sup><sup>i</sup> r, such that v<sup>i</sup> <sup>r</sup> corresponds to the call-site node <sup>v</sup><sup>i</sup>−<sup>1</sup> <sup>c</sup> at which <sup>p</sup><sup>i</sup>−<sup>1</sup> ends.

### 3.2 Properties of Demand and Covering

At a high level, Backward DFAS works by growing paths in the backward direction by a single edge at a time starting from the target node (node k in our example in Figure 3). Every time this process results in a path reaching the start node (node a in our example), and the path is feasible, the approach simply transfers the entry value d<sup>0</sup> via this path to the target node. The main challenge is that due to the presence of cycles and recursion, there are an infinite number of feasible paths in general. In this subsection we present a set of lemmas that embody our intuition on how a finite subset of the set of all paths can be enumerated such that the join of the values brought by these paths is equal to the JOFP. We then present our complete approach in Section 3.3.

Demand Coverage Lemma: Let p<sup>2</sup> and p <sup>2</sup> be two paths from a node v<sup>i</sup> to a node v<sup>j</sup> such that demand(p <sup>2</sup>) ≤ demand(p2). If p<sup>1</sup> is any path ending at vi, then demand(p1.p <sup>2</sup>) <sup>≤</sup> demand(p1.p2). -

This lemma can be argued using induction on the length of path p1. A similar observation has been used to solve coverability of lossy channels and well-structured transition systems in general [3,18,2]. An important corollary of this lemma is that for any two paths p <sup>2</sup> and p<sup>2</sup> from v<sup>i</sup> to v<sup>j</sup> such that demand(p <sup>2</sup>) ≤ demand(p2), if there exists a path p<sup>1</sup> ending at v<sup>i</sup> such that p1.p<sup>2</sup> is feasible, then p1.p <sup>2</sup> is also feasible.

Function Coverage Lemma: Let p<sup>2</sup> be a path from a node v<sup>i</sup> to a node v<sup>j</sup> , and P<sup>2</sup> be a set of paths from v<sup>i</sup> to v<sup>j</sup> such that ( p- <sup>2</sup>∈P<sup>2</sup> ptf (p <sup>2</sup>)) ptf (p2). Let p<sup>1</sup> be any path ending at v<sup>i</sup> and p<sup>3</sup> be any path beginning at v<sup>j</sup> . Under the distributivity assumption stated in Section 3.1, the following property holds: ( p- <sup>2</sup>∈P<sup>2</sup> ptf (p1.p <sup>2</sup>.p3)) ptf (p1.p2.p3). -

The following result follows from the Demand and Function Coverage Lemmas and from monotonicity of the transfer functions:

Corollary 1: Let p<sup>2</sup> be a path from a node v<sup>i</sup> to a node v<sup>j</sup> , and P<sup>2</sup> be a set of paths from v<sup>i</sup> to v<sup>j</sup> such that P<sup>2</sup> covers p2. Let p<sup>1</sup> be any path ending at vi. Then, the set of paths {p1.p <sup>2</sup> | p <sup>2</sup> <sup>∈</sup> <sup>P</sup>2} covers the path <sup>p</sup>1.p2. -

We now use the running example from Figure 3 to illustrate how we leverage Corollary 1 in our approach. When we grow paths in backward direction from the target node k, two candidate paths that would get enumerated (among others) are p<sup>i</sup> ≡ hijk and p<sup>j</sup> ≡ hijkhijk (in that order). Now, p<sup>i</sup> covers p<sup>j</sup> . Therefore, by Corollary 1, any backward extension p1.p<sup>j</sup> of p<sup>j</sup> (p<sup>1</sup> is any path prefix) is guaranteed to be covered by the analogous backward extension p1.p<sup>i</sup> of pi. By definition of covering, it follows that p1.p<sup>i</sup> brings in a data value that conservatively over-approximates the value brought in by p1.p<sup>j</sup> . Therefore, our approach discards p<sup>j</sup> as soon as it gets enumerated. To summarize, our approach discards any path as soon as it is enumerated if it is covered by some subset of the previously enumerated and retained paths.

Due to the finite height of the transfer functions lattice, and because demand vectors cannot contain negative values, at some point in the algorithm every new path that can be generated by backward extension at that point would be discarded immediately. At this point the approach would terminate, and soundness would be guaranteed by definition of covering.

In the inter-procedural setting the situation is more complex. We first present two lemmas that set the stage. The lemmas both crucially make use of the assumption that recursive procedures are not allowed to have "receive" operations. For any path p<sup>a</sup> that contains no receive operations, and for any demand vector d, we first define supply(pa, d) as min(s, d), where s is the sum of the queuing vectors of the edges of pa.

Supply Limit Lemma: Let p1, p<sup>2</sup> be two paths from v<sup>i</sup> to v<sup>j</sup> such that there are no receive operations in p<sup>1</sup> and p2. Let p<sup>b</sup> be any path beginning at v<sup>j</sup> . If demand(pb) = d, and if supply(p1, d) ≥ supply(p2, d), then demand(p1.pb) ≤ demand(p2.pb). -

A set of paths P is said to d-supply-cover a path p<sup>a</sup> iff: (a) all paths in P have the same start node and same end node (respectively) as pa, (b) ( <sup>p</sup>-<sup>∈</sup><sup>P</sup> ptf (p )) ptf (pa), and (c) for each p ∈ P, supply(p , d) ≥ supply(pa, d).

Supply Coverage Lemma: If pa.p<sup>b</sup> is a path, and demand(pb) = d, and if a set of paths P d-supply-covers pa, and p<sup>a</sup> as well as all paths in P have no receive operations, then the set of paths {p .p<sup>b</sup> | p ∈ P} covers the path pa.pb.

Proof argument: Since P d-supply-covers pa, by the Supply Limit Lemma, we have (a): for all p ∈ P, demand(p .pb) ≤ demand(pa.pb). Since P d-supplycovers pa, we also have ( <sup>p</sup>-<sup>∈</sup><sup>P</sup> ptf (p )) ptf (pa). From this, we use the Function Coverage lemma to infer that (b): ( <sup>p</sup>-<sup>∈</sup><sup>P</sup> ptf (p .pb)) ptf (pa.pb). The result now follows from (a) and (b). -

Consider path hijk in our example, which gets enumerated and retained (as discussed earlier). This path gets extended back as qhijk; let us denote this path as p . Let d be the demand of p (i.e., is equal to 3). Our plan now is to extend this path in the backward direction all the way up to node p, by prepending interprocedurally valid and complete (i.e., IVC) paths of procedure foo in front of p . An IVC path is one that begins at the entry node of foo, ends at the return node of foo, is of arbitrary calling depth, has balanced calls and returns, and has no pending returns when it completes [50]. First, we enumerate the IVC path(s) with calling-depth zero (i.e., path co in the example), and prepend them in front of p . We then produce deeper IVC paths, in phases. In each phase i, i > 0, we inline IVC paths of calling-depth i − 1 that have been enumerated and retained so far into the path templates of the procedure to generate IVC paths of calling-depth i, and prepend these IVC paths in front of p . We terminate when each IVC path that is generated in a particular phase j is d-supply-covered by some subset P of IVC paths generated in previous phases.

The soundness of discarding the IVC paths of phase j follows from the Supply Coverage lemma (p would take the place of p<sup>b</sup> in the lemma's statement, while the path generated in phase j would take the place of p<sup>a</sup> in the lemma statement). The termination condition is guaranteed to be reached eventually, because: (a) the supplies of all IVC paths generated are limited to d, and (b) the lattice of transfer functions is of finite height. Intuitively, we could devise a sound termination condition even though deeper and deeper IVC paths can increment counters more and more, because a deeper IVC path that increments the counters beyond the demand of p does not really result in lower overall demand when prepended before p than a shallower IVC path that also happens to meet the demand of p (Supply Limit lemma formalizes this).

In our running example, for the path qhijk, whose demand is equal to three, prefix generation for it happens to terminate in the fifth phase. The IVC paths that get generated in the five phases are, respectively, p<sup>0</sup> = co, p<sup>1</sup> = cdefgmcono, p<sup>2</sup> = (cdefgm)<sup>2</sup>co(no)<sup>2</sup>, p<sup>3</sup> = (cdefgm)<sup>3</sup>co(no)<sup>3</sup>, p<sup>4</sup> = (cdefgm)<sup>4</sup>co(no)<sup>4</sup>, and p<sup>5</sup> = (cdefgm)<sup>5</sup>co(no)<sup>5</sup>. supply(p3, 3) = supply(p4, 3) = supply(p5, 3) = 3. The LCP transfer functions of the paths are as follows. ptf (p3) is (t'=1, x'=x+3, y'=x+2, z'=1), ptf (p4) is (t'=1, x'=x+4, y'=x+3, z'=1), while ptf (p5) is (t'=1, x'=x+5, y'=x+4, z'=1). {p3, p4} 3-supply-covers p5.

We also need a result that when the IVC paths in the jth phase are d-supplycovered by paths generated in preceding phases, then the IVC paths that would be generated in the (j + 1)th would also be d-supply-covered by paths generated


in phases that preceded j. This can be shown using a variant of the Supply Coverage Lemma, which we omit in the interest of space. Once this is shown, it then follows inductively that none of the phases after phase j are required, which would imply that it would be safe to terminate.

The arguments presented above were in a restricted setting, namely, that there is only one call in each procedure, and that only recursive calls are allowed. These restrictions were assumed only for simplicity, and are not actually assumed in the algorithm to be presented below.

### 3.3 Data Flow Analysis Algorithm

Our approach is summarized in Algorithm 1. ComputeJOFP is the main routine. The algorithm works on a given iVCFG (which is an implicit parameter to the algorithm), and is given a target node at which the JOFP is to be computed.



A key data structure in the algorithm is sPaths; for any node v, sPaths(v) is the set of all paths that start from v and end at target that the algorithm has generated and retained so far. The workList at any point stores a subset of the paths in sPaths, and these are the paths of the iVCFG that need to be extended backward.

To begin with, all edges incident onto target are generated and added to the sets sPaths and workList (Line 4 in Algorithm 1). In each step the algorithm picks up a path p from workList (Line 6), and extends this path in the backward direction. The backward extension has three cases based on the start node of the path p. The simplest case is the intra-procedural case, wherein the path is extended backwards in all possible ways by a single edge (Lines 21-23). The routine Covered, whose definition is not shown in the algorithm, checks if its first argument (a path) is covered by its second argument (a set of paths). Note, covered paths are not retained.

When the start node of p is the entry node of a procedure F<sup>1</sup> (Lines 14-19), the path is extended backwards via all possible call-site-to-entry edges for procedure F1.

If the starting node of path p is a return-site node v<sup>1</sup> (Lines 8-13) in a calling procedure, we invoke a routine ComputeEndToEnd (in line 10 of Algorithm 1). This routine, which we explain later, returns a set IVC paths of the called procedure such that every IVC path of the called procedure is d-supply-covered by some subset of paths in the returned set, where d denotes demand(p). These returned IVC paths are prepended before p (Line 11), with the call-edge e<sup>1</sup> and return edge r<sup>1</sup> appropriately inserted.

The final result returned by the algorithm (see Lines 25 and 26 in Algorithm 1) is the join of the values transferred by the zero-demand paths (i.e., feasible paths) starting from the given entry value d<sup>0</sup> ∈ L.

Routine ComputeEndToEnd: This routine is specified in Algorithm 2, and is basically a generalization of the approach that we described in Section 3.2, now handling multiple call-sites in each procedure, mutual recursion, calls to non-recursive procedures, etc. We do assume for simplicity of presentation that there are no cycles (i.e., loops) in the procedures, as this results in a fixed number of path templates in each procedure. There is no loss of generality here because we allow recursion. The routine incrementally populates a group of sets – there is a set named sIVCPaths(Fi, d) for each procedure F<sup>i</sup> in the system. The idea is that when the routine completes, sIVCPaths(Fi, d) will contain a set of IVC paths of F<sup>i</sup> that d-supply-cover all IVC paths of Fi. Note that we simultaneously populate covering sets for all the procedures in the system in order to handle mutual recursion.

The routine ComputeEndToEnd first enumerates and saves all zero-depth paths in all procedures (see Line 3 in Algorithm 2). The routine then iteratively takes a path template at a time, and fills in the "holes" between corresponding (call-site, return-site) pairs of the form v<sup>i</sup>−<sup>1</sup> <sup>c</sup> , v<sup>i</sup> <sup>r</sup> in the path template with IVC paths of the procedure that is called from this pair of nodes, thus generating a deeper IVC path (see the loop in lines 6-11). A newly generated IVC path p is retained only if it is not d-supply-covered by other IVC paths already generated for the current procedure F<sup>i</sup> (Lines 10-11). The routine terminates when no more IVC paths that can be retained are generated, and returns the set sIVCPaths(F, d).

### 3.4 Illustration

We now illustrate our approach using the example in Figure 3. Algorithm 1 would start from the target node k, and would grow paths one edge at a time. After four steps the path hijk would be added to sPaths(h) (the intermediate steps would add suffixes of this path to sPaths(i), sPaths(j), and sPaths(k)). Next, path khijk would be generated and discarded, because it is covered by the "root" path k. Hence, further iterations of the cycle are avoided. On the other hand, the path hijk would get extended back to node q, resulting in path qhijk being retained in sPaths(q). This path would trigger a call to routine ComputeEndToEnd. As discussed in Section 3.2, this routine would return the following set of paths: p<sup>0</sup> = co, and p<sup>i</sup> = (cdefgm)<sup>i</sup> co(no)<sup>i</sup> for each <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>4</sup>. (Recall, as discussed in Section 3.2, that (cdefgm)<sup>5</sup>co(no)<sup>5</sup> and deeper IVC paths are 3-supply-covered by the paths {p3, p4}.)

Each of the paths returned above by the routine ComputeEndToEnd would be prepended in front of qhijk, with the corresponding call and return edges inserted appropriately. These paths would then be extended back to node a. Hence, the final set of paths in sPaths(a) would be abpcoqhijk, abpcdefgmconoqhijk, abp(cdefgm)<sup>2</sup>co(no)2, abp(cdefgm)<sup>3</sup>co(no)3, and abp(cdefgm)<sup>4</sup>co(no)4. Of these

paths, the first two are ignored, as they are not feasible. The initial data-flow value (in which all variables are non-constant) is sent via the remaining three paths. In all these three paths the final values of variables 't' and 'z' are one. Hence, these two constants are inferred at node k.

### 3.5 Properties of the algorithm

We provide argument sketches here about the key properties of Backward DFAS. Detailed proofs are available in the appendix that accompanies this paper [4].

Termination. The argument is by contradiction. For the algorithm to not terminate, one of the following two scenarios must happen. The first is that an infinite sequence of paths gets added to some set sPaths(v). By Higman's lemma it follows that embedded within this infinite sequence there is an infinite sequence p1, p2,..., such that for all i, demand(pi) ≤ demand(p<sup>i</sup>+1). Because the algorithm never adds covered paths, it follows that for all i: <sup>1</sup>≤k≤i+1 ptf (pk) <sup>1</sup>≤k≤<sup>i</sup> ptf (pk). However, this contradicts the assumption that the lattice of transfer functions is of finite height. The second scenario is that an infinite sequence of IVC paths gets added to some set sIVCPaths(F, d) for some procedure F and some demand vector d in some call to routine ComputeEndToEnd. Because the "supply" values of the IVC paths are bounded by d, it follows that embedded within the infinite sequence just mentioned there must exist an infinite sequence of paths p1, p2,..., such that for all i, supply(pi, d) ≥ supply(p<sup>i</sup>+1, d). However, since d-supply-covered paths are never added, it follows that for all i: <sup>1</sup>≤k≤i+1 ptf (pk) <sup>1</sup>≤k≤<sup>i</sup> ptf (pk). However, this contradicts the assumption that the lattice of transfer functions is of finite height.

Soundness and Precision. We already argued informally in Section 3.2 that the algorithm explores all feasible paths in the system, omitting only paths that are covered by other already-retained paths. By definition of covering, this is sufficient to guarantee over-approximation of the JOFP. The converse direction, namely, under-approximation, is obvious to see as every path along which the data flow value d<sup>0</sup> is sent at the end of the algorithm is a feasible path. Together, these two results imply that the algorithm is guaranteed to compute the precise JOFP.

Complexity. We show the complexity of our approach in the single-procedure setting. Our analysis follows along the lines of the analysis of the backwards algorithm for coverability in VASS [6]. The overall idea, is to use the technique of Rackoff [48] to derive a bound on the length of the paths that need to be considered. We derive a complexity bound of O(Δ.h<sup>2</sup>.**L**<sup>2</sup>r+1.r.log(**L**)), where Δ is the total number of transitions in the VCFG, Q is the number of VCFG nodes, <sup>h</sup> is the height of lattice of L→L functions, and **<sup>L</sup>** = (Q.(<sup>h</sup> + 1).2)(3r)!+1.

# 4 Forward DFAS Approach

The Backward DFAS approach, though precise, requires the transfer function lattice to be of finite height. Due to this restriction, infinite-height abstract domains like Octagons [44], which need widening [12], are not accommodated by Backward DFAS. To address this, we present the Forward DFAS approach, which admits any complete lattice as an abstract domain (if the lattice is of infinite height then a widening operator should also be provided). The trade-off is precision. Forward DFAS elides only some of the infeasible paths in the VCFG, and hence, in general, computes a conservative over-approximation of the JOFP. Forward DFAS is conceptually not as sophisticated as Backward DFAS, but is still a novel proposal from the perspective of the literature.

The Forward DFAS approach is structured as an instantiation of Kildall's data flow analysis framework [32]. This framework needs a given complete lattice, the elements of which will be propagated around the VCFG as part of the fix point computation. Let L be the given underlying finite or infinite complete lattice. L either needs to not have any infinite ascending chains (e.g., Constant Propagation), or <sup>L</sup> needs to have an associated widening operator "L". The complete lattice D that we use in our instantiation of Kildall's framework is defined as D ≡ Dr,κ → L, where κ ≥ 0 is a user-given non-negative integer, and Dr,κ is the set of all vectors of size r (where r is the number of counters in the VCFG) such that all entries of the vectors are integers in the range [0, κ]. The ordering on this lattice is as follows: (d<sup>1</sup> ∈ D) (d<sup>2</sup> ∈ D) iff ∀c ∈ Dr,κ. d1(c) <sup>L</sup> d2(c). If a widening operator <sup>L</sup> has been provided for <sup>L</sup>, we define a widening operator for <sup>D</sup> as follows: <sup>d</sup>1d<sup>2</sup> <sup>≡</sup> λc <sup>∈</sup> <sup>D</sup>r,κ. d1(c) <sup>L</sup> <sup>d</sup>2(c).

We now need to define the abstract transfer functions with signature D → D for the VCFG edges, to be used within the data flow analysis. As an intermediate step to this end, we define a ternary relation boundedMove1 as follows. Any triple of integers (p, q, s) ∈ boundedMove1 iff

$$\begin{array}{ll} (0 \le p \le \kappa) \land & \\ ((q \ge 0 \land p + q \le \kappa \land s = p + q) \lor & \\ (q \ge 0 \land p + q > \kappa \land s = \kappa) \lor & \\ (q < 0 \land p = \kappa \land 0 \le s \le \kappa \land \kappa - s \le -1 \ast q) \lor & \\ (q < 0 \land p < \kappa \land p + q \ge 0 \land s = p + q) \end{array} \tag{d}$$

We now define a ternary relation boundedMove on vectors. A triple of vectors (c1, c2, c3) belongs to relation boundedMove iff all three vectors are of the same size, and for each index i, (c1[i], c2[i], c3[i]) ∈ boundedMove1 .

We now define the D → D transfer function for the VCFG edge q<sup>1</sup> f,w −−→ q<sup>2</sup> as follows:

$$\operatorname{fun}(l \in D) \equiv \lambda c\_2 \in D\_{r,\kappa} \cdot \left( \bigcup\_{\substack{c\_1 \text{ such that } \{c\_1, w, c\_2\} \in \operatorname{bounded} M \text{ove}}} f(l(c\_1)) \right)$$

Finally, let l<sup>0</sup> denote following function: λc ∈ Dr,κ. if c is **0** then d<sup>0</sup> else ⊥, where d<sup>0</sup> ∈ L. We can now invoke Kildall's algorithm using the fun transfer functions defined above at all VCFG edges, using l<sup>0</sup> as the fact at the "entry" to the "main" procedure. After Kildall's algorithm has finished computing the fix

point solution, if l D <sup>v</sup> ∈ D is the fix point solution at any node v, we return the value <sup>c</sup>∈Dr,κ l D <sup>v</sup> (c) as the final result at v.

The intuition behind the approach above is as follows. If v is a vector in the set Dr,κ, and if (c, m) is a channel-message pair, then the value in the (c, m)th slot of v encodes the number of instances of message m in channel c currently. An important note is that if this value is κ, it actually indicates that there are κ or more instances of message m in channel c, whereas if the value is less than κ it represents itself. Hence, we can refer to vectors in Dr,κ as bounded queue configurations. If d ∈ D is a data flow fact that holds at a node of the VCFG after data flow analysis terminates, then for any v ∈ Dr,κ if d(v) = l, it indicates that l is a (conservative over-approximation) of the join of the data flow facts brought by all feasible paths that reach the node such that the counter values at the ends of these paths are as indicated by v (the notion of what counter values are indicated by a vector v ∈ Dr,κ was described earlier in this paragraph).

The relation boundedMove is responsible for blocking the propagation along some of the infeasible paths. The intuition behind it is as follows. Let us consider a VCFG edge q<sup>1</sup> <sup>f</sup>:L→L,w −−−−−−→ <sup>q</sup>2. If <sup>c</sup><sup>1</sup> is a bounded queue configuration at node <sup>q</sup>1, then, c<sup>1</sup> upon propagation via this edge will become a bounded queue configuration c<sup>2</sup> at q<sup>2</sup> iff (c1, w, c2) ∈ boundedMove. Lines (a) and (b) in the definition of boundedMove1 correspond to sending a message; line (b) basically throws away the precise count when the number of messages in the channel goes above κ. Line (c) corresponds to receiving a message when all we know is that the number of messages currently in the channel is greater than or equal to κ. Line (d) is key for precision when the channel has less than κ messages, as it allows a receive operation to proceed only if the requisite number of messages are present in the channel.

The formulation above extends naturally to inter-procedural VCFGs using generic inter-procedural frameworks such as the call strings approach [55]. We omit the details of this in the interest of space.

Properties of the approach: Since Forward DFAS is an instantiation of Kildall's algorithm, it derives its properties from the same. As the set Dr,k is a finite set, it is easy to see that the fix-point algorithm will terminate.

To argue the soundness of the algorithm, we consider the concrete lattice D<sup>c</sup> ≡ D<sup>r</sup> → L, and the following "concrete" transfer function for the VCFG edge q1 f,w −−→ q2: fun\_conc(l ∈ Dc) ≡ λc<sup>2</sup> ∈ Dr. <sup>c</sup>1∈D<sup>r</sup> such that <sup>c</sup>1+w=c<sup>2</sup> <sup>f</sup>(l(c1)) , where D<sup>r</sup> is the set of all vectors of size r of natural numbers. We then argue that the abstract transfer function fun defined earlier is a consistent abstraction [12] of fun\_conc. This soundness argument is given in detail in the appendix that accompanies this paper [4].

If we restrict our discussion to single-procedure systems, the complexity of our approach is just the complexity of applying Kildall's algorithm. This works out to O(Q<sup>2</sup>κ<sup>r</sup>h), where Q is the number of VCFG nodes, and h is either the height of the lattice L or the maximum increasing sequence of values from L that is obtainable at any point using the lattice L in conjunction with Kildall's algorithm, using the given widening operation L.

Fig. 4. Data flow facts over a run of the algorithm

Illustration: We illustrate Forward DFAS using the example in Figure 3. Figure 4 depicts the data flow values at four selected nodes as they get updated over eight selected points of time during the run of the algorithm. In this illustration we assume a context insensitive analysis for simplicity (it so happens that context sensitivity does not matter in this specific example). We use the value κ = 3. Each small table is a data flow fact, i.e., an element of D ≡ Dr,κ → L. The top-left cell in the table shows the node at which the fact arises. In each row the first column shows the counter value, while the remaining columns depict the known constant value of the variables ( indicates unknown). Here are some interesting things to note. When any tuple of constant values transfers along the path from node c to node m, the constant values get updated due to the assignment statements encountered, and this tuple shifts from counter i to counter i + 1 (if i is not already equal to κ) due to the "send" operation encountered. When we transition from Step (5) to Step (6) in the figure, we get 's, as counter values 2 and 3 in Step (5) both map to counter value 3 in Step (6) due to κ being 3 (hence, the constant values get joined). The value at node o (in Step (7)) is the join of values from Steps (5) and (6). Finally, when the value at node o propagates to node k, the tuple of constants associated with counter value 3 end up getting mapped to all lower values as well due to the receive operations encountered.

Note, the precision of our approach in general increases with the value of κ (the running time increases as well). For instance, if κ is set to 2 (rather than 3) in the example, some more infeasible paths would be traversed. Only z = 1 would be inferred at node k, instead of (t = 1, z = 1).

# 5 Implementation and Evaluation

We have implemented prototypes of both the Forward DFAS and Backward DFAS approaches, in Java. Both the implementations have been parallelized, using the ThreadPool library. With Backward DFAS the iterations of the outer "repeat" loop in Algorithm 1 run in parallel, while with Forward DFAS propagations of

values from different nodes to their respective successors happen in parallel. Our implementations currently target systems without procedure calls, as none of our benchmarks had recursive procedure calls.

Our implementations accept a given system, and a "target" control state q in one of the processes of the system at which the JOFP is desired. They then construct the VCFG from the system (see Section 2.1), and identify the target set of q, which is the set of VCFG nodes in which q is a constituent. For instance, in Figure 2, the target set for control state e is {(a, e),(b, e)}. The JOFPs at the nodes in the target set are then computed, and the join of these JOFPs is returned as the result for q.

Each variable reference in any transition leaving any control state is called a "use". For instance, in Figure 2, the reference to variable x along the outgoing transition from state d is one use. In all our experiments, the objective is to find the uses that are definitely constants by computing the JOFP at all uses. This is a common objective in many research papers, as finding constants enables optimizations such as constant folding, and also checking assertions in the code. We instantiate Forward DFAS with the Constant Propagation (CP) analysis, and Backward DFAS with the LCP analysis (for the reason discussed in Section 3.1). We use the bound κ = 2 in all runs of Forward DFAS, except with two benchmarks which are too large to scale to this bound. We discuss this later in this section. All the experiments were run on a machine with 128GB RAM and four AMD Opteron 6386 SE processors (64 cores total).

### 5.1 Benchmarks and modeling



We use 14 benchmarks for our evaluations. These are described in the first two columns of Table 1. Four benchmarks – bartlett, leader, lynch, and peterson – are Promela models for the Spin model-checker. Three benchmarks – boundedAsync, receive1, and replicatingStorage – are from the P language repository (www. github.com/p-org). Two benchmarks – server and chameneos – are from the Basset repository (www.github.com/SoftwareEngineeringToolDemos/FSE-2010-Basset). Four benchmarks – event\_bus\_test, jobqueue\_test, nursery\_test, and bookCollectionStore – are real world Go programs. There is one toy example "mutex", for ensuring mutual exclusion, via blocking receive messages, that we have made ourselves. We provide precise links to the benchmarks in the appendix [4].

Our DFAS implementations expect the asynchronous system to be specified in an XML format. We have developed a custom XML schema for this, closely based on the Promela modeling language used in Spin [26]. We followed this direction in order to be able to evaluate our approach on examples from different languages. We manually translated each benchmark into an XML file, which we call a model. As the input XML schema is close to Promela, the Spin models were easily translated. Other benchmarks had to be translated to our XML schema by understanding their semantics.

Note that both our approaches are expensive in the worst-case (exponential or worse in the number of counters r). Therefore, we have chosen benchmarks that are moderate in their complexity metrics. Still, these benchmarks are real and contain complex logic (e.g., the leader election example from Promela, which was discussed in detail in Section 1.1). We have also performed some manual simplifications to the benchmarks to aid scalability (discussed below). Our evaluation is aimed towards understanding the impact on precision due to infeasible paths in real benchmarks, and not necessarily to evaluate applicability of our approach to large systems.

We now list some of the simplifications referred to above. Language-specific idioms that were irrelevant to the core logic of the benchmark were removed. The number of instances of identical processes in some of the models were reduced in a behavior-preserving manner according to our best judgment. In many of the benchmarks, messages carry payload. Usually the payload is one byte. We would have needed 256 counters just to encode the payload of one 1-byte message. Therefore, in the interest of keeping the analysis time manageable, the payload size was reduced to 1 bit or 2 bits. The reduction was done while preserving key behavioral aspects according to our best judgment. Finally, procedure calls were inlined (there was no use of recursion in the benchmarks).

In the rest of this section, whenever we say "benchmark", we actually mean the model we created corresponding to the benchmark. Table 1 also shows various metrics of our benchmarks (based on the XML models). Column 3-6 depict, respectively, the number of processes, the total number of variables, the number of "counters" r, and the total number of nodes in the VCFG. We provide our XML models of all our benchmarks, as well as full output files from the runs of our approach, as a downloadable folder (https://drive.google.com/drive/folders/ 181DloNfm6\_UHFyz7qni8rZjwCp-a8oCV).

### 5.2 Data flow analysis results


Table 2. Data flow analysis results

We structure our evaluation as a set of research questions (RQs) below. Table 2 summarizes results for the first three RQs, while Table 3 summarizes results for RQ 4.

RQ 1: How many constants are identified by the Forward and Backward DFAS approaches? Column (2) in Table 2 shows the number of uses in each benchmark. Columns (4)-Forw and (4)-Back show the number of uses identified as constants by the Forward and Backward DFAS approaches, respectively. In total across all benchmarks Forward DFAS identifies 63 constants whereas Backward DFAS identifies 49 constants.

Although in aggregate Backward DFAS appears weaker than Forward DFAS, Backward DFAS infers more constants than Forward DFAS in two benchmarks – jobqueue\_test and bookCollectionStore. Therefore, the two approaches are actually incomparable. The advantage of Forward DFAS is that it can use relatively more precise analyses like CP that do not satisfy the assumptions of Backward DFAS, while the advantage of Backward DFAS is that it always computes the precise JOFP.

RQ 2: How many assertions are verified by the approaches? Verifying assertions that occur in code is a useful activity as it gives confidence to developers. All but one of our benchmarks had assertions (in the original code itself, before modeling). We carried over these assertions into our models. For instance, for the benchmark leader, the assertion appears in Line 11 in Figure 1. In some benchmarks, like jobqueue\_test, the assertions were part of test cases. It makes sense to verify these assertions as well, as unlike in testing, our technique considers all possible interleavings of the processes. As "bookCollectionStore" did not come with any assertions, a graduate student who was unfamiliar with our work studied the benchmark and suggested assertions.

Column (3) in Table 2 shows the number of assertions present in each benchmark. Columns (5)-Forw and (5)-Back in Table 2 show the number of assertions declared as safe (i.e., verified) by the Forward and Backward DFAS approaches, respectively. An assertion is considered verified iff constants (as opposed to "") are inferred for all the variables used in the assertion, and if these constants satisfy the assertion. As can be seen from the last row in Table 2, both approaches verify a substantial percentage of all the assertions – 52% by Forward DFAS and 48% by Backward DFAS. We believe these results are surprisingly useful, given that our technique needs no loop invariants or usage of theorem provers.

RQ 3: Are the DFAS approaches more precise than baseline approaches? We compare the DFAS results with two baseline approaches. The first baseline is a Join-Over-all-Paths (JOP) analysis, which basically performs CP analysis on the VCFG without eliding any infeasible paths. Columns (6)-JOP and (7)-JOP in Table 2 show the number of constants inferred and the number of assertions verified by the JOP baseline. It can be seen that Backward DFAS identifies 2.2 times the number of constants as JOP, while Forward DFAS identifies 2.9 times the number of constants as JOP (see columns (4)-Forw, (4)-Back, and (6)-JOP in the Total row in Table 2). In terms of assertions, each of them verifies almost 5 times as many assertions as JOP (see columns (5)-Forw, (5)-Back, and (7)-JOP in Total row in Table 2.) It is clear from the results that eliding infeasible paths is extremely important for precision.

The second baseline is Copy Constant Propagation (CCP) [50]. This is another variant of constant propagation that is even less precise than LCP. However, it is based on a finite lattice, specifically, an IFDS [50] lattice. Hence this baseline represents the capability of the closest related work to ours [29], which elides infeasible paths but supports only IFDS lattices, which are a sub-class of finite lattices. (Their implementation also used a finite lattice of predicates, but we are not aware of a predicate-identification tool that would work on our benchmarks out of the box.) We implemented the CCP baseline within our Backward DFAS framework. This baseline hence computes the JOFP using CCP (i.e., it elides infeasible paths).

Columns (6)-CCP and (7)-CCP in Table 2 show the number of constants inferred and the number of assertions verified by the CCP baseline. From the Total row in Table 2 it can be seen that Forward DFAS finds 62% more constants than CCP, while Backward DFAS finds 26% more constants than CCP. With respect to number of assertions verified, the respective gains are 57% and 43%. In other words, infinite domains such as CP or LCP can give significantly more precision than closely related finite domains such as CCP.


Table 3. Execution time in seconds

RQ 4: How does the execution cost of DFAS approaches compare to the cost of the JOP baseline? The columns in Table 3 correspond to the benchmarks (only first three letters of each benchmark's name are shown in the interest of space). The rows show the running times for Forward DFAS, Backward DFAS, JOP baseline, and CCP baseline, respectively.

The JOP baseline was quite fast on almost all benchmarks (except lynch). This is because it maintains just a single data flow fact per VCFG node, in contrast to our approaches. Forward DFAS was generally quite efficient, except on chameneos and lynch. On these two benchmarks, it scaled only with κ = 1 and κ = 0, respectively, encountering memory-related crashes at higher values of κ (we used κ = 2 for all other benchmarks). These two benchmarks have large number of nodes and a high value of r, which increases the size of the data flow facts.

The running time of Backward DFAS is substantially higher than the JOP baseline. One reason for this is that being a demand-driven approach, the approach is invoked separately for each use (Table 2, Col. 2), and the cumulative time across all these invocations is reported in the table. In fact, the mean time per query for Backward DFAS is less than the total time for Forward DFAS on 9 out of 14 benchmarks, in some cases by a factor of 20x. Also, unlike Forward DFAS, Backward DFAS visits a small portion of the VCFG in each invocation. Therefore, Backward DFAS is more memory efficient and scales to all our benchmarks. Every invocation of Backward DFAS consumed less than 32GB of memory, whereas with Forward DFAS, three benchmarks (leader, replicatingStorage, and jobqueue\_test) required more than 32GB, and two (lynch and chameneos) needed more than the 128 GB that was available in the machine. On the whole, the time requirement of Backward DFAS is still acceptable considering the large precision gain over the JOP baseline.

### 5.3 Limitations and Threats to Validity

The results of the evaluation using our prototype implementation are very encouraging, in terms of both usefulness and efficiency. The evaluation does however pose some threats to the validity of our results. The benchmark set, though extracted from a wide set of sources, may not be exhaustive in its idioms. Also, while modeling, we had to simplify some of the features of the benchmarks in order to let the approaches scale. Therefore, applicability of our approach directly on real systems with all their language-level complexities, use of libraries, etc., is not yet established, and would be a very interesting line of future work.

# 6 Related Work

The modeling and analysis of parallel systems, which include asynchronous systems, multi-threaded systems, distributed systems, event-driven systems, etc., has been the focus of a large body of work, for a very long time. We discuss some of the more closely related previous work, by dividing the work into four broad categories.

Data Flow Analysis: The work of Jhala et al. [29] is the closest work that addresses similar challenges as our work. They combine the Expand, Enlarge and Check (EEC) algorithm [21] that answers control state reachability in WSTS [18], with the unordered channel abstraction, and the IFDS [50] algorithm for data flow analysis, to compute the JOFP solution for all nodes. They admit only IDFS abstract domains, which are finite by definition. Some recent work has extended this approach for analyzing JavaScript [60] and Android [45] programs. Both our approaches are dissimilar to theirs, and we admit infinite lattices (like CP and LCP). On the other hand, their approach is able to handle parameter passing between procedures, which we do not.

Bronevetsky et al. [8] address generalized data flow analysis of a very restricted class of systems, where any receive operation must receive messages from a specific process, and channel contents are not allowed to cause non-determinism in control flow. Other work has addressed analysis of asynchrony in web applications [28,42]. These approaches are efficient, but over-approximate the JOFP by eliding only certain specific types of infeasible paths.

Formal Modeling and Verification: Verification of asynchronous systems has received a lot of attention over a long time. VASS [31] and Petri nets [49] (which both support unordered channel abstraction) have been used widely to model parallel and asynchronous processes [31,38,54,29,19,5]. Different analysis problems based on these models have been studied, such as reachability of configurations [7,43,34,35], coverability and boundedness [31,3,2,18,21,6], and coverability in the presence of stacks or other data structures [57,5,9,10,40].

The coverability problem mentioned above is considered equivalent to control state reachability, and has received wide attention [1,14,29,19,54,20,33,5,56]. Abdulla et al. [3] were the first to provide a backward algorithm to answer coverability. Our Backward DFAS approach is structurally similar to their approach, but is a strict generalization, as we incorporate data flow analysis using infinite abstract domains. (It is noteworthy that when the abstract domain is finite, then data flow analysis can be reduced to coverability.) One difference is that we use the unordered channel abstraction, while they use the lossy channel abstraction. It is possible to modify our approach to use lossy channels as well (when there are no procedure calls, which they also do not allow); we omit the formalization of this due to lack of space.

Bouajjani and Emmi [5] generalize over previous coverability results by solving the coverability problem for a class of multi-procedure systems called recursively parallel programs. Their class of systems is somewhat broader than

ours, as they allow a caller to receive the messages sent by its callees. Our ComputeEndToEnd routine in Algorithm 2 is structurally similar to their approach. They admit finite abstract domains only. It would be interesting future work to extend the Backward DFAS approach to their class of systems.

Our approaches explore all interleavings between the processes, following the Spin semantics. Whereas, the closest previous approaches [29,5] only address "event-based" systems, wherein a set of processes execute sequentially without interleaving at the statement level, but over an unbounded schedule (i.e., each process executes from start to finish whenever it is scheduled).

Other forms of verification: Proof-based techniques have been explored for verifying asynchronous and distributed systems [24,58,47,22]. These techniques need inductive variants and are not as user-friendly as data flow analysis techniques. Behavioral types have been used to tackle specific analysis problems such as deadlock detection and correct usage of channels [36,37,52].

Testing and Model Checking: Languages and tools such as Spin and Promela [26], P [15], P# [13], and JPF-Actor [39] have been used widely to model-check asynchronous systems. A lot of work has been done in testing of asynchronous systems [16,13,53,23,59] as well. Such techniques are bounded in nature and cannot provide the strong verification guarantees that data flow analysis provides.

# 7 Conclusions and Future Work

In spite of the substantial body of work on analysis and verification of distributed systems, there is no existing approach that performs precise data flow analysis of such systems using infinite abstract domains, which are otherwise very commonly used with sequential programs. We propose two data flow analysis approaches that solve this problem – one computes the precise JOFP solution always, while the other one admits a fully general class of infinite abstract domains. We have implemented our approaches, analyzed 14 benchmarks using the implementation, and have observed substantially higher precision from our approach over two different baseline approaches.

Our approach can be extended in many ways. One interesting extension would be to make Backward DFAS work with infinite height lattices, using widening. Another possible extension could be the handling of parameters in procedure calls. There is significant scope for improving the scalability using better engineering, especially for Forward DFAS. One could explore the integration of partial-order reduction [11] into both our approaches. Finally, we would like to build tools based on our approach that apply directly to programs written in commonly-used languages for distributed programming.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Types for Complexity of Parallel Computation in Pi-Calculus**

Patrick Baillot<sup>1</sup> and Alexis Ghyselen<sup>1</sup> -

Univ Lyon, CNRS, ENS de Lyon, Universite Claude-Bernard Lyon 1, LIP, F-69342, Lyon Cedex 07, France alexis.ghyselen@ens-lyon.fr

**Abstract.** Type systems as a technique to analyse or control programs have been extensively studied for functional programming languages. In particular some systems allow to extract from a typing derivation a complexity bound on the program. We explore how to extend such results to parallel complexity in the setting of the pi-calculus, considered as a communication-based model for parallel computation. Two notions of time complexity are given: the total computation time without parallelism (the work) and the computation time under maximal parallelism (the span). We define operational semantics to capture those two notions, and present two type systems from which one can extract a complexity bound on a process. The type systems are inspired both by size types and by input/output types, with additional temporal information about communications.

**Keywords:** Type Systems · Pi-calculus · Process Calculi · Complexity Analysis · Implicit Computational Complexity · Size Types

# **1 Introduction**

The problem of certifying time complexity bounds for programs is a challenging question, related to the problem of statically inferring time complexity, and it has been extensively studied in the setting of sequential programming languages. One particular approach to these questions is that of type systems, which offers the advantage of providing an analysis which is formally-grounded, compositional and modular. In the functional framework several rich type systems have been proposed, such that if a program can be assigned a type, then one can extract from the type derivation a complexity bound for its execution on any input (see e.g. [21,25,22,20,6,4]). The type system itself thus provides a complexity certification procedure, and if a type inference algorithm is also provided one obtains a complexity inference procedure. This research area is also related to implicit computational complexity, which aims at providing type systems or static criteria to characterize some complexity classes within a programming language (see e.g. [24,13,33,18,15]), and which have sometimes in a second step inspired a complexity certification or inference procedure.

However, while the topic of complexity certification has been thoroughly investigated for sequential programs both for space and time bounds, there only have been a few contributions in the settings of parallel programs and distributed systems. In these contexts, several notions of cost can be of interest to abstract the computation time. First one can wish to know what is during a program execution the total cumulated computation time on all processors. This is called the work of the program. Second, one can wonder if an infinite number of processors were available, what would be the execution time of the program when it is maximally parallelized. This is called the span or depth of the program.

The paper [23] has addressed the problem of analysing the time complexity of programs written in a parallel first-order functional language. In this language one can spawn computations in parallel and use the resulting values in the body of the program. This allows to express a large bunch of classical parallel algorithms. Their approach is based on amortized complexity and builds on a line of work in the setting of sequential languages to define type systems, which allow to derive bounds on the work and the span of the program. However, the language they are investigating does not allow communication between those computations in parallel. Our goal is to provide an approach to analyse the time complexity of programs written in a rich language for communication-based parallel computation, allowing the representation of several synchronization features. We use for that π-calculus, a process calculus which provides process creation, channel name creation and name-passing in communication. An alternative approach could be to use a language described with session types, as in [9,10]. We will discuss the expressivity for both languages in Section 4.2.

We want to propose methods that, given a parallel program written in πcalculus, allow to derive upper bounds on its work and span. Let us mention that these notions are not only of theoretical interest. Some classical results provide upper bounds, expressed by means of the work (w) and span (s), on the evaluation time of a parallel program on a given number p of processors. For instance such a program can be evaluated on a shared-multiprocessor system (SMP) with p processors in time O(max(w/p, s)) (see e.g. [19]).

Our goal in this paper is essentially fundamental and methodological, in the sense that we aim at proposing type systems which are general enough, wellbehaved and provide good complexity properties. We do not focus yet at this stage on the design and efficiency of type inference algorithms.

We want to be able to derive complexity bounds which are parametric in the size of inputs, for instance which depend on the length of a list. For that it will be useful to have a language of types that can carry information about sizes, and for this reason we take inspiration from size types [26,6]. So data-types will be annotated with an index which will provide some information on the size of values. Our approach then follows the standard approach to typing in the π-calculus, namely typing a channel by providing the types of the messages that can be sent or received through it. Actually a second ingredient will be necessary for us, input/output types. In this setting a channel is given a set of capabilities: it can be an input, an output, or have both input/output capabilities.

**Contributions.** We consider a π-calculus with an explicit tick construction; this allows to specify several cost models, instead of only counting the number of reduction steps. Two semantics of this π-calculus are proposed to define formally the work and the span of a process. We then design two type systems for the π-calculus, one for the work and one for the span, and establish for both a soundness theorem: if a process is well-typed in the first (resp. second) type system, then its type provides an expression which, for its execution on any input, bounds the work (resp. span). This approach by type system is generic: the soundness proof relies on subject reduction, and it gives a compositional and flexible result that could be adapted to extensions of the base language.

**Discussion.** Note that even though one of the main usages of π-calculus is to specify and analyse concurrent systems, the present paper does not aim at analysing the complexity of arbitrary π-calculus concurrent programs. Indeed, some typical examples of concurrent systems like semaphores will simply not be typable in the system for span (see Sect. 4.2), because of linearity conditions. As explained above, our interest here is instead focused on parallel computation expressed in the π-calculus, which can include some form of cooperative concurrency. We believe the analysis of complexity bounds for concurrent π-calculus is another challenging question, which we want to address in future work.

A comparison with related works will be done in Sect. 6.

# **2 The Pi-calculus with Semantics for Work and Span**

In this work, we consider the π-calculus as a model of parallelism. The main points of π-calculus are that processes can be composed in parallel, communication between processes happens with the use of channels, and channel names can be created dynamically.

### **2.1 Syntax, Congruence and Standard Semantics for** *π***-Calculus**

We present here a classical syntax for the asynchronous π-calculus. More details about π-calculus and variants of the syntax can be found in [34]. We define the sets of variables, expressions and processes by the following grammar.

$$v \coloneqq x, y, z \mid a, b, c \qquad e \coloneqq v \mid \mathbf{0} \mid \mathbf{s}(e) \mid \top \mid e \coloneqq e'$$

$$P, Q \coloneqq 0 \mid (P \mid Q) \mid \mid \mathbf{t}(\vec{v}).P \mid a(\vec{v}).P \mid \overline{a}\langle \vec{e} \rangle \mid (\nu a)P \mid \mathbf{t} \mathbf{ck}.P$$

$$\left\lfloor \mathtt{match}(e) \; \{\mathtt{0} \mapsto P; \; ; \; \mathbf{s}(x) \mapsto Q\} \; \middle\mid \mathtt{match}(e) \; \{\mathtt{I}\} \mapsto P; ; \; x :: y \mapsto Q\right\}$$

Variables x, y, z denote base type variables, they represent integers or lists. Variables a, b, c denote channel names. The notation ˜v stands for a sequence of variables v1, v2,..., vk. In the same way, ˜e is a sequence of expressions. We work up to α-renaming, and we write P[˜v := ˜e] to denote the substitution of the free variables ˜v in P by ˜e. For the sake of simplicity, we consider only integers and lists as base types in the following, but the results can be generalized to other algebraic data-types.

Intuitively, P | Q stands for the parallel composition of P and Q, a(˜v).P represents an input: it stands for the reception on the channel a of a tuple of values identified by the variables ˜v in the continuation P. The process !a(˜v).P is a replicated version of a(˜v).P, it behaves like an infinite number of a(˜v).P in parallel. The process ae˜ represents an output: it sends a sequence of expressions on the channel a. A process (νa)P dynamically creates a new channel name a and then proceeds as P. We also have classical pattern matching on data types, and finally, in tick.P, the tick incurs an additional cost of one. This constructor is the source of time complexity in a program. It can represent different cost models and it is more general than only counting the number of reduction steps. For example, by adding a tick after each input, we can count the number of communications in a process. By adding it after each replicated input on a channel a, we can count the number of calls to a. And if we want to count the number of reduction steps, we can add a tick after each input and pattern matching.

We can now describe the classical semantics for this calculus. We first define on those processes a congruence relation ≡ : this is the least congruence relation closed under:

$$P \mid 0 \equiv P \qquad P \mid Q \equiv Q \mid P \qquad P \mid (Q \mid R) \equiv (P \mid Q) \mid R$$

(νa)(νb)P ≡ (νb)(νa)P (νa)(P | Q) ≡ (νa)P | Q (when a is not free in Q)

Note that the last rule can always be applied from right to left by α-renaming. Also, one can see that contrary to usual congruence relation for the π-calculus, we do not consider the rule for replicated input (!P ≡ !P | P) as it will be captured by the semantics, and α-conversion is not taken as an explicit rule in the congruence. By associativity, we will often write parallel composition for any number of processes and not only two. Another way to see this congruence relation is that, up to congruence, a process is entirely described by a set of channel names and a multiset of processes. Formally, we can give the following definition.

**Definition 1 (Guarded Processes and Canonical Form).** A process G is guarded if it has one of the following shapes:

$$G := !a(\tilde{v}).P \mid a(\tilde{v}).P \mid \overline{a} \langle \tilde{e} \rangle \mid \mathtt{tick}.P \mid \rangle$$

match(e) {<sup>0</sup> → <sup>P</sup>; ; <sup>s</sup>(x) → <sup>Q</sup>} | match(e) {[] → <sup>P</sup>; ; <sup>x</sup> :: <sup>y</sup> → <sup>Q</sup>}

We say that a process is in canonical form if it has the form (νa˜)(G<sup>1</sup> |···| Gn) with G1,...,G<sup>n</sup> guarded processes.

The properties of this canonical form can be found in the technical report [5], here we only use it to give an intuition of how one could understand a process. Thus, it is enough to consider that for each process P, there is a process in canonical form congruent to P. Moreover, this canonical form is unique up to the ordering of names and processes, and up to congruence inside guarded processes.

We can now define the usual reduction relation for the π-calculus, that we denote P → Q. It is defined by the rules given in Figure 1. The rules for integers are not detailed as they can be deduced from the ones for lists. Remark that substitution should be well-defined in order to do some reduction steps: channel names must be substituted by other channel names and base type variables can be substituted by any expression except channel names. However, when we will consider typed processes, this will always yield well-defined substitutions.


**Fig. 1.** Standard Reduction Rules

For now, this relation cannot reduce a process of the form tick.P. So, we need to introduce a reduction rule for tick. From this semantics, we will define a reduction corresponding to total complexity (work). Then, we will define parallel complexity (span) by taking an expansion of the standard reduction.

### **2.2 Semantics and Complexity**

**Work.** We first describe a semantics for the work, that is to say the total number of ticks during a reduction without parallelism. The time reduction →<sup>1</sup> is defined in Figure 2. Intuitively, this reduction removes exactly one tick at the top-level.


**Fig. 2.** Simple Tick Reduction Rules

Then from any process P, a sequence of reduction steps to Q is just a sequence of one-step reductions with → or →1, and the work complexity of this sequence is the number of →<sup>1</sup> steps. In this paper, we always consider the worst-case complexity so the work of a process is defined as the maximal complexity over all such sequences of reduction steps from this process.

Notice that with this semantics for work, adding tick in a process does not change its behaviour: we do not create nor erase reduction paths.

**Span.** A more interesting notion of complexity in this calculus is the parallel one. Before presenting the semantics, we present with some simple examples what kind of properties we want for this parallel complexity.

First, we want a parallel complexity that works as if we had an infinite number of processors. So, on the process tick.0 | tick.0 | tick.0 | ··· | tick.0 we want the complexity to be 1, whatever the number of tick in parallel.

Moreover, reductions with a zero-cost complexity (in our setting, this should mean all reductions except when we reduce a tick) should not harm this maximal parallelism. For example a().tick.0 | a | tick.0 should also have complexity one, because intuitively this synchronization between the input and the output can be done independently of the tick on the right, and then the tick on the left can be reduced in parallel with the tick on the right.

Finally, as before for the work, adding a tick should not change the behaviour of a process. For instance, consider the process tick.a().P<sup>0</sup> | a().tick.P<sup>1</sup> | a, where a is not used in P<sup>0</sup> and P1. This process should have the complexity max(1 + C0, 1 + C1), where C<sup>i</sup> is the cost of Pi. Indeed, there are two possible reductions, either we reduce the tick, and then we synchronize the left input with the output, and continue with P0, or we first do the synchronization with the right input and the output, we then reduces the ticks and finally we continue as P1.

A possible way to define such a parallel complexity would be to take causal complexity [13,12,11], however we believe there is a simpler presentation for our case. In the technical report [5], we prove the equivalence between causal complexity and the notion presented here. The idea has been proposed by Naoki Kobayashi (private communication). It consists in introducing a new construction for processes, m : P, where m is an integer. A process using this constructor will be called an annotated process. Intuitively, this annotated process has the meaning P with m ticks before. We can then enrich the congruence relation ≡ with the following rules:

$$m: (P \mid Q) \equiv (m:P) \mid (m:Q) \qquad m: (\nu a)P \equiv (\nu a)(m:P)$$

$$m: (n:P) \equiv (m+n):P \qquad 0: P \equiv P$$

This intuitively means that the ticks can be distributed over parallel composition, name creation can be done before or after ticks without changing the semantics, ticks can be grouped together, and finally zero tick is equivalent to nothing.

With this congruence relation and this new constructor, we can give a new shape to the canonical form presented in Definition 1.

**Definition 2 (Canonical Form for Annotated Processes).** An annotated process is in canonical form if it has the shape:

$$(\nu \tilde{a})(n\_1 : G\_1 \mid \dots \mid n\_m : G\_m)$$

### with G1,...,G<sup>m</sup> guarded annotated processes.

Remark that the congruence relation above allows to obtain this canonical form from any annotated processes. With this intuition in mind, we can then define a reduction relation ⇒<sup>p</sup> for annotated processes. The rules are given in Figure 3. We do not detail the rules for integers as they are deducible from the ones for lists. Intuitively, this semantics works as the usual semantics for pi-calculus, but when doing a synchronization, we keep the maximal annotation, and ticks are memorized in the annotations.

$$\begin{array}{cc} \hline \begin{array}{cc} \overline{(n:a(\bar{v}).P) \mid (m:\overline{a}(\bar{e})) \Rightarrow\_{P} (\max(m,n):P[\bar{v}:=\bar{e}])} & \overline{\mathtt{tck}.P \Rightarrow\_{p} 1:P} \\ \hline \\ \overline{(n:\!ia(\bar{v}).P) \mid (m:\overline{a}(\bar{e})) \Rightarrow\_{p} (n:\!a(\bar{v}).P) \mid (\max(m,n):P[\bar{v}:=\bar{e}]) \\ \hline \\ \overline{\mathtt{match}([]) \; \{[] \mapsto P; : x::y \mapsto Q\} \Rightarrow\_{p} P} \\ \hline \\ \overline{\mathtt{match}(e::e') \; \{[] \mapsto P; : x::y \mapsto Q\} \Rightarrow\_{p} Q[x,y:=e,e']} \\ \hline P \Rightarrow\_{p} Q & \overline{(n:\!a)P \Rightarrow\_{p} (\nu a)Q} \\ \hline P \equiv P' & \overline{(\nu a)P \Rightarrow\_{p} (\nu a)Q} \\ \hline \\ \end{array} \\ \begin{array}{cc} P \Rightarrow\_{p} Q \\ \hline \\ P \Rightarrow\_{p} Q \\ \overline{(n:\!a)P \Rightarrow\_{p} (\nu a)Q} \\ \hline \\ P \Rightarrow\_{p} Q & \\ \end{array}$$

**Fig. 3.** Reduction Rules

We can then define the parallel complexity of an annotated process.

**Definition 3 (Parallel Complexity).** Let P be an annotated process. We define its local complexity C(P) by:

$$\mathcal{C}\_{\ell}(n:P) = n + \mathcal{C}\_{\ell}(P) \qquad \mathcal{C}\_{\ell}(P \mid Q) = \max(\mathcal{C}\_{\ell}(P), \mathcal{C}\_{\ell}(Q))$$

$$\mathcal{C}\_{\ell}((\nu a)P) = \mathcal{C}\_{\ell}(P) \qquad \mathcal{C}\_{\ell}(G) = 0 \text{ if } G \text{ is a guardied process}$$

Equivalently, C(P) is the maximal integer that appears in the canonical form of P. Then, for an annotated process P, its global parallel complexity is given by max{n | P ⇒<sup>∗</sup> <sup>p</sup> Q ∧ C(Q) = n} where ⇒<sup>∗</sup> <sup>p</sup> is the reflexive and transitive closure of ⇒p.

To show that this parallel complexity is well-behaved, we give the following lemma.

**Lemma 1 (Reduction and Local Complexity).** Let P, P be annotated processes such that P ⇒<sup>p</sup> P . Then, we have C(P ) ≥ C(P).

This lemma is proved by induction. The main point is that guarded processes have a local complexity equal to zero, so doing a reduction will always increase this local complexity. Thus, in order to bound the complexity of an annotated process, we need to reduce it with ⇒p, and then we have to take the maximum local complexity over all normal forms. Moreover, this semantics respects the conditions given in the beginning of this section.

# **2.3 An Example Process**

As an example, we show a way to encode a usual functional program in πcalculus. In order to do this, we use replicated input to encode functions, and we use a return channel for the output. So, given a channel f representing a function F such that fy, a returns F(y) on the channel a, we can write the "map" function in our calculus as described in Figure 4. The main idea for this kind of encoding is to use the dynamic creation of names ν to create the return channel before calling a function, and then to use this channel to obtain back the result of this call. Note that we chose here as cost model the number of calls to f, and we can see the versatility of a tick constructor instead of a complexity that relies only on the number of reduction steps.

With this process, on a list of length n, the work is n. However, as all calls to f could be done in parallel, the span is 1 for any non-empty list as input.

```
!map(x , f ,a) . match(x) {
  [ ]→ ax ; ;
  y :: x1→ (νb) (νc) (tick.f y , b | mapx1 ,f , c | b( z ) . c(x2 ) .a z :: x2 )
}
```
**Fig. 4.** The Map Function

# **3 Size Types for the Work**

We now define a type system to bound the work of a process. The goal is to obtain a soundness result: if a process P is typable then we can derive an integer expression K such that the work of P is bounded by K.

# **3.1 Size Input/Output Types**

Our type system relies on the definition of indices to keep track of the size of values in a process. Those indices were for example used in [6] and are greatly inspired by [26]. The main idea of those types in a sequential setting is to control recursive calls by ensuring a decreasing in the sizes.

**Definition 4.** The set of indices for natural numbers is given by the following grammar.

$$I, J, K := i, j, k \mid f(I\_1, \ldots, I\_n)$$

The variables i, j, k are called index variables. The set of index variables is denoted V. The symbol f is an element of a given set of function symbols containing addition and multiplication. We also assume that we have the subtraction as a function symbol, with n−m = 0 when m ≥ n. Each function symbol f of arity ar(f) comes with an interpretation <sup>f</sup> : <sup>N</sup>ar(f) <sup>→</sup> <sup>N</sup>.

Given an index valuation <sup>ρ</sup> : V → <sup>N</sup>, we extend the interpretation of function symbols to indices, noted -<sup>I</sup><sup>ρ</sup> in the natural way. In an index <sup>I</sup>, the substitution of the occurrences of i in I by J is denoted I{J/i}.

**Definition 5 (Constraints on Indices).** Let φ ⊂ V be a set of index variables. A constraint C on φ is an expression with the shape I J where I and J are indices with free variables in φ and denotes a binary relation on integers. Usually, we take ∈ {≤, <, =, =}. Finite set of constraints are denoted Φ.

For a set <sup>φ</sup> ⊂ V, we say that a valuation <sup>ρ</sup> : <sup>φ</sup> <sup>→</sup> <sup>N</sup> satisfies a constraint I J on <sup>φ</sup>, noted <sup>ρ</sup> I J when -<sup>I</sup><sup>ρ</sup> -<sup>J</sup><sup>ρ</sup> holds. Similarly, <sup>ρ</sup> <sup>Φ</sup> holds when <sup>ρ</sup> <sup>C</sup> for all <sup>C</sup> <sup>∈</sup> <sup>Φ</sup>. Likewise, we note <sup>φ</sup>; <sup>Φ</sup> <sup>C</sup> when for all valuations <sup>ρ</sup> on <sup>φ</sup> such that <sup>ρ</sup> <sup>Φ</sup> we have <sup>ρ</sup> <sup>C</sup>. Remark that the order <sup>≤</sup> in a context <sup>φ</sup>; <sup>Φ</sup> is not total in general, for example (i, j); · <sup>i</sup> <sup>≤</sup> ij and (i, j); · ij ≤ i.

**Definition 6.** The set of base types is given by the following grammar.

$$\mathcal{B} := \mathsf{Nat}[I, J] \mid \mathsf{List}[I, J](\mathcal{B})$$

Intuitively, an integer n of type Nat[I,J] must be such that I ≤ n ≤ J. Likewise, a list of type List[I,J](B) must have a length between I and J. With those types comes a notion of subtyping, in order to have some flexibility on bounds. This is described by the rules of Figure 5. In a subtyping judgement φ; Φ T T the free index variables of T, T , Φ should be included in φ.


**Fig. 5.** Subtyping Rules for Base Size Types

Then, after base types, we have to give a type to channel names in a process. As we want to generalize subtyping for channel types, we will use input/output types [34]. Intuitively, in such a type, in addition to the types that can be sent and received for a channel, a channel is given a set of capabilities: either it is both an input and output channel, or it has only one of those capabilities. This is useful in order to use subtyping, as an input channel and an output channel do not behave in the same way with regards to subtyping. Indeed, an input/output channel is invariant for subtyping, an input channel is covariant and an output channel is contravariant. Unlike in usual input/output types, in this work we also distinguish two kinds of channels : the simple channels (that we will often call channels), and replicated channels (called servers).

**Definition 7.** The set of types is given by the following grammar.

$$T := \mathcal{B} \mid \mathsf{ch}(\tilde{T}) \mid \mathsf{in}(\tilde{T}) \mid \mathsf{out}(\tilde{T}) \mid \forall \tilde{i}. \mathsf{serv}^{K}(\tilde{T}) \mid \forall \tilde{i}. \mathsf{iserv}^{K}(\tilde{T}) \mid \forall \tilde{i}. \mathsf{oserv}^{K}(\tilde{T})$$

The three different types for channels and servers correspond to the three different sets of capabilities. We note serv when the server have both capabilities, iserv when it has only input and oserv when it has only output. Then, for servers, we have additional information: there is a quantification over index variables, and the index K stands for the complexity of the process spawned by this server. A typical example could be a server taking as input a list and a channel, and sending to this channel the sorted list, in time k · n where n is the size of the list : P = !a(x, b). ··· be where e represents at the end of the process the list x sorted. Such a server name a could be given the type <sup>∀</sup>i.servk·<sup>i</sup> (List[0, i](B), out(List[0, i](B))). This type means that for all integers i, if given a list of size at most i and an output channel waiting for a list of size at most i, the process spawned by this server will stop at time at most k · i. Those bounded index variables ˜i are very useful especially for replicated input. As a replicated input is made to be used several times with different values, it is useful to allow this kind of polymorphism on indices. Moreover, if a replicated input is used to encode a recursion, with this polymorphism we can take into account the different recursive calls with different values and different complexities.


**Fig. 6.** Subtyping Rules for Server Types

Then, we describe subtyping for servers in Figure 6. As explained previously, capabilities modify the variance of types, and a channel can lose capabilities by subtyping. Subtyping for channel types can be deduced from the rules for servers. Note that the transitivity rule is not necessary and the subtyping relation could be exhaustively described. However, in order to reduce the number of rules, we present subtyping with a transitivity rule. Finally, subtyping can be extended to contexts, and we write Γ Δ when Γ and Δ have the same domain and for each variable v : T ∈ Γ and v : T ∈ Δ, we have T T .

$$\begin{array}{cc} \begin{array}{cc} v:T \in \Gamma \\ \phi; \Phi; \Gamma \vdash v:T \end{array} & \begin{array}{cc} \phi; \Phi; \Gamma \vdash \mathsf{0}: \mathsf{Nat}[0,0] \end{array} & \begin{array}{cc} \phi; \Phi; \Gamma \vdash []: \mathsf{List}[0,0](\mathcal{B}) \end{array} \\\\ \begin{array}{cc} \phi; \Phi; \Gamma \vdash e: \mathsf{Nat}[I,J] \\ \hline \phi; \Phi; \Gamma \vdash \mathsf{s}(e): \mathsf{Nat}[I+1,J+1] \end{array} \\\\ \begin{array}{cc} \phi; \Phi; \Gamma \vdash e: \mathcal{B} & \phi; \Phi; \Gamma \vdash e': \mathsf{List}[I,J](\mathcal{B}) \\ \hline \phi; \Phi; \Gamma \vdash e: \mathsf{i}: \mathsf{i}': \mathsf{List}[I+1,J+1](\mathcal{B}) \end{array} \\\\ \begin{array}{cc} \phi; \Phi; \Delta \vdash e: U & \phi; \Phi \vdash \Gamma \sqmid \Delta & \phi; \Phi \vdash U \sqsubseteq T \\ \hline \phi; \Phi; \Gamma \vdash e: T \end{array} \end{array}$$

**Fig. 7.** Typing Rules for Expressions

We can now present the type system. Rules for expressions are given in Figure 7. The typing for expressions φ; Φ; Γ e : T means that under the constraints Φ, in the context Γ, the expression e can be given the type T. We use the notation <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>e</sup>˜ : <sup>T</sup>˜ for a sequence of typing judgements for expressions in the tuple ˜e.

Then, rules for processes are described in Figure 8 and Figure 9. Figure 9 describes rules specific to work, whereas rules in Figure 8 will be reused for span. A typing judgement <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>P</sup> <sup>K</sup> intuitively means that under the constraints Φ, in a context Γ, a process P is typable and its work complexity is bounded by K.

The rules can be seen as a combination of input/output typing rules with rules found in a size type system. The main differences are that because of the two kinds of channels, we need two rules for an output. And, for servers, quantification over index variables should be taken in account. Note that a replicated input has complexity zero, and it is a call to this server that generates complexity in the type system. This is because once defined, a replicated input stays during all the reduction, so we do not want them to generate complexity. Note also that the pattern matching rules are the only ones which add constraints in the hypothesis, which provide information on the size in the typing. This is particularly useful for recursion. Finally, there is an explicit rule for subtyping, and in this rule we can arbitrarily increase the index corresponding to the complexity.

<sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>0</sup> <sup>0</sup> <sup>φ</sup>; <sup>Φ</sup>; Γ, a : <sup>T</sup> <sup>P</sup> <sup>K</sup> <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> (νa)<sup>P</sup> <sup>K</sup> φ; Φ; Γ e : Nat[I,J] <sup>φ</sup>; (Φ, I <sup>≤</sup> 0); <sup>Γ</sup> <sup>P</sup> <sup>K</sup> <sup>φ</sup>; (Φ, J <sup>≥</sup> 1); Γ, x : Nat[I−1, J−1] <sup>Q</sup> <sup>K</sup> <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> match(e) {<sup>0</sup> → <sup>P</sup>; ; <sup>s</sup>(x) → <sup>Q</sup>} <sup>K</sup> φ; Φ; Γ e : List[I,J](B) <sup>φ</sup>; (Φ, I <sup>≤</sup> 0); <sup>Γ</sup> <sup>P</sup> <sup>K</sup> <sup>φ</sup>; (Φ, J <sup>≥</sup> 1); Γ, x : <sup>B</sup>, y : List[I−1, J−1](B) <sup>Q</sup> <sup>K</sup> <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> match(e) {[] → <sup>P</sup>; ; <sup>x</sup> :: <sup>y</sup> → <sup>Q</sup>} <sup>K</sup> <sup>φ</sup>; <sup>Φ</sup>; <sup>Δ</sup> <sup>P</sup> K φ; <sup>Φ</sup> <sup>Γ</sup> Δ φ; <sup>Φ</sup> <sup>K</sup> <sup>≤</sup> <sup>K</sup>- <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>P</sup> <sup>K</sup>-

**Fig. 8.** Common Typing Rules for Processes


**Fig. 9.** Work Typing Rules for Processes

### **3.2 Subject Reduction**

We now state the properties of this typing system. We do not detail the proofs as we will be more precise in the following sections with the type system for span. In the type system for work, we can easily obtain some properties such as weakening and strengthening and that index variables can be substituted by any index in a typing derivation. Finally, we have that substitution in processes preserves typing. With those properties, we obtain the usual subject reduction.

**Theorem 1 (Subject Reduction).** If <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>P</sup> <sup>K</sup> and <sup>P</sup> <sup>→</sup> <sup>Q</sup> then <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>Q</sup> <sup>K</sup>.

Then, we also obtain the following theorem.

**Theorem 2 (Quantitative Subject Reduction).** If P →<sup>1</sup> Q and φ; Φ; Γ <sup>P</sup> <sup>K</sup> then we have <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>Q</sup> <sup>K</sup> with <sup>φ</sup>; <sup>Φ</sup> <sup>K</sup> + 1 <sup>≤</sup> <sup>K</sup>.

So, as a consequence we almost immediately obtain that K is indeed a bound on the work of <sup>P</sup> if we have <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>P</sup> <sup>K</sup>.

Note that this soundness result is easily adaptable to similar type systems for work. As stated before, we can enrich the type system with other algebraic data-types and the proof can easily be adapted. Moreover, we can get rid of the distinction between channels and servers and take a similar typing for both, and we still get the soundness. We decided here to present this version as an introduction for the type system for span, but the work in itself can be of interest.

For example, an interesting consequence of this soundness theorem is that it immediately gives soundness for any subsystem. In particular, we detail in the technical report [5] a (slightly) weaker typing system where the shape of types are restricted in order to have an inference procedure close to the one in [4].

# **4 Types for Parallel Complexity**

We present here a type system for span, so we want as previously a type system such that typing a process gives us a bound on its span. Formally, we will prove the following theorem:

**Theorem 3 (Typing and Complexity).** Let P be a process and m be its global parallel complexity. If we have <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>P</sup> <sup>K</sup>, then <sup>φ</sup>; <sup>Φ</sup> <sup>K</sup> <sup>≥</sup> <sup>m</sup>.

Remark that this theorem talks about open processes. However, our notion of complexity does not behave well with open processes. For example the process match(v) {0 → P; ; s(x) → Q} is in normal form for a variable v, so this process has global complexity 0. Still, we will also obtain the following corollary:

**Corollary 1 (Complexity and Open Processes).**

**–** If <sup>φ</sup>; <sup>Φ</sup>; Γ, <sup>v</sup>˜ : <sup>T</sup>˜ <sup>P</sup> <sup>K</sup>, then for any sequence of expressions <sup>e</sup>˜ such that <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>e</sup>˜ : <sup>T</sup>˜, <sup>K</sup> is a bound on the global complexity of <sup>P</sup>[˜<sup>v</sup> := ˜e] **–** If <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>P</sup> K, then for any other annotated process <sup>Q</sup> such that <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> Q K , max(K, K ) is a bound on the global complexity of P | Q.

So, when we give a typing <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>P</sup> <sup>K</sup> for an open process, we should not see K as a bound on the actual complexity on P, but we should see it as a bound on the complexity of this particular process in an environment respecting the type of <sup>Γ</sup>. So, in <sup>φ</sup>; <sup>Φ</sup>; <sup>v</sup> : Nat[2, 10] match(v) {<sup>0</sup> → <sup>P</sup>; ; <sup>s</sup>(x) → <sup>Q</sup>} <sup>K</sup>, K is a bound on the complexity of this pattern matching under the assumption that the environment gives to v an integer value between 2 and 10.

### **4.1 Size Types with Time**

The type system is an extension of the previous one. In order to take into account parallelism, we need a way to synchronize the time between processes in parallel, thus we will add some time information in types, as in [27] or [9].

**Definition 8.** The set of types and base types are given by the grammar:

$$\begin{aligned} \mathcal{B} &:= \mathsf{Nat}[I, J] \mid \mathsf{List}[I, J](\mathcal{B}) \\\\ T &:= \mathcal{B} \mid \mathsf{ch}\_{I}(\bar{T}) \mid \mathsf{in}\_{I}(\bar{T}) \mid \mathsf{out}\_{I}(\bar{T}) \mid \forall\_{I} \bar{i}. \mathsf{ser} \mathsf{V}^{K}(\bar{T}) \mid \forall\_{I} \bar{i}. \mathsf{iserv}^{K}(\bar{T}) \mid \forall\_{I} \bar{i}. \mathsf{oserv}^{K}(\bar{T}) \end{aligned}$$

As before, we have channel types, server types, and input/output capabilities in those types. For a channel type or a server type, the index I is called the time of this type. Giving a channel name the type ch<sup>I</sup> (T˜) ensures that communication on this channel should happen within time I. For example, a channel name of type ch0(T˜) should be used to communicate before any tick occurs. With this information, we can know when the continuation of an input will be available. Likewise, a server name of type <sup>∀</sup>I˜i.iserv<sup>K</sup>(T˜) should be used in a replicated input, and this replicated input should be ready to receive for any time greater than I. Typically, a process tick.!a(˜v).P enforces that the type of a is <sup>∀</sup>I˜i.iserv<sup>K</sup>(T˜) with <sup>I</sup> greater than one, as the replicated input is not ready to receive at time zero.

As before, we define a notion of subtyping on those types. The rules are essentially the same as the ones in Figures 5 and 6. The only difference is that we force the time of a type to be invariant in subtyping.

In order to write the typing rules, we need some other definitions to work with time in types. The first thing we need is a way to advance time.

**Definition 9 (Advancing Time in Types).** Given a set of index variables φ, a set of constraints Φ, a type T and an index I, we define T after I time units, denoted T φ;Φ <sup>−</sup><sup>I</sup> by:

	- <sup>−</sup><sup>I</sup> <sup>=</sup> <sup>∀</sup>(J−I) **–** ∀J˜i.iserv<sup>K</sup>(T˜) φ;Φ <sup>−</sup><sup>I</sup> <sup>=</sup> <sup>∀</sup>(J−I) ˜i.iserv<sup>K</sup>(T˜) if <sup>φ</sup>; <sup>Φ</sup> <sup>J</sup> <sup>≥</sup> <sup>I</sup>. It is undefined otherwise. **–** ∀J˜i.oserv<sup>K</sup>(T˜) φ;Φ <sup>−</sup><sup>I</sup> <sup>=</sup> <sup>∀</sup>(J−I) ˜i.oserv<sup>K</sup>(T˜).

This definition can be extended to contexts, with v : T,Γ φ;Φ <sup>−</sup><sup>I</sup> <sup>=</sup> <sup>v</sup> : T φ;Φ <sup>−</sup><sup>I</sup> ,Γ φ;Φ −I if T φ;Φ <sup>−</sup><sup>I</sup> is defined. Otherwise, <sup>v</sup> : T,Γ−<sup>I</sup> <sup>=</sup> Γ φ;Φ <sup>−</sup><sup>I</sup> . We will often omit the φ; Φ in the notation when it is clear from the context.

Recall that as the order <sup>≤</sup> on indexes is not total, <sup>φ</sup>; <sup>Φ</sup> - J ≥ I does not mean that φ; Φ J<I.

Let us explain a bit the definition here. For base types, there is no time indication thus nothing happens. Then, one can wonder what happens when the time of T is not greater than I. For non-server channel types, we consider that their time is over, thus we erase them from the context. For servers this is a bit more complicated. Indeed, when a server is defined, it must stay available until the end. Thus, an output to a server should always be possible, no matter the time. Still, the input capability of a server should not be available eternally, as the time I is supposed to mean the time for which a replicated input is effectively defined. So, when this time has passed, we should not be able to define a replicated input any more.

**Definition 10 (Time Invariant Context).** Given a set of index variables φ and a set of constraints Φ, a context Γ is said to be time invariant when it only contains base type variables or output server types <sup>∀</sup>I˜i.oserv<sup>K</sup>(T˜) with φ; Φ I = 0.

Such a context is thus invariant by the operator ·−<sup>I</sup> for any <sup>I</sup>. This is typically the kind of context that we need to define a server, as a server should not be dependent on the time it is called. We can now present the type system. Typing rules for expressions and some processes do not change, they can be found in Figure 7 and Figure 8. In Figure 10, we present the remaining rules in this type system that differ from the ones in Figure 9. As before, a typing judgement <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>P</sup> <sup>K</sup> intuitively means that under the constraints <sup>Φ</sup>, in a context Γ, a process P is typable and its span complexity is bounded by K.

$$\begin{array}{c} \phi; \Phi; \Gamma \vdash P \lhd K \qquad \phi; \Phi; \Gamma \vdash Q \lhd K \qquad \frac{\phi; \Phi; \{\Gamma\}\_{-I} \vdash P \lhd K}{\phi; \Phi; \Gamma \vdash \mathtt{\mathtt{\mathtt{\overline{\{\langle\}}\}}} \quad \phi; \Phi; \{\Gamma\}\_{-I} \vdash P \lhd K \\\\ \phi; \Phi; \Gamma \vdash A \vDash I \qquad \frac{\phi; \Phi \vdash \{\Gamma\}\_{-I} \quad \phi; \Phi; \Gamma \vdash \mathtt{\mathtt{\overline{\overline{\langle\rangle}\}}} \quad \phi; \bar{\mathtt{\overline{\overline{\langle\rangle}\rangle}}} \quad \bar{\mathtt{\overline{\overline{\langle\rangle}\rangle}}} \quad \phi; \bar{\mathtt{\overline{\overline{\langle\rangle}\rangle}}} \quad \bar{\mathtt{\overline{\overline{\langle\rangle}\rangle}}} \quad \phi; \bar{\mathtt{\overline{\overline{\langle\rangle}\rangle}}} \quad \bar{\mathtt{\overline{\overline{\langle\rangle}\rangle}}} \quad \phi; \bar{\mathtt{\overline{\overline{\langle\rangle}\rangle}}} \quad \bar{\mathtt{\overline{\overline{\langle\rangle}\rangle}}} \quad \phi; \bar{\mathtt{\overline{\overline{\langle\rangle}\rangle}}} \quad \bar{\mathtt{\overline{\overline{\langle\rangle}\rangle}}} \quad \phi; \bar{\mathtt{\overline{\overline{\langle\rangle}\rangle}}} \quad \phi; \bar{\mathtt{\overline{\overline{\langle\rangle}\rangle}}} \quad \phi; \bar{\mathtt{\overline{\overline{\overline{\langle\rangle}\rangle}}} \quad \phi; \bar{\mathtt{\overline{\overline{\langle\rangle}\rangle}}} \quad \phi; \bar{\mathtt{\overline{\overline{\langle\rangle}\rangle}}} \quad \phi; \bar{\mathtt{\overline{\overline{\langle\rangle}\rangle}}} \quad \phi; \bar{\mathtt{\overline{\overline{\overline{\langle\rangle}\rangle}}} \quad \phi; \bar{\mathtt{\overline{\$$

**Fig. 10.** Span Typing Rules for Processes

The rule for parallel composition shows that we consider parallel complexity as we take the maximum between the two processes instead of the sum. In practice, we ask for the same complexity K in both branches of parallel composition, but with the subtyping rule, it corresponds indeed to the maximum. For input server, we integrate some weakening on context (Δ), and we want a time invariant context to type the server, as a server should not depend on time. Weakening

is important since some types are not time invariant, such as channels. So, we need to separate time invariant types that can be used in the continuation P from other types.

Some rules make the time advance in their continuation, for example the tick rule or input rule. This is expressed by the advance time operator on contexts, and because time advances, the complexity also increases. Also, remark that because of the advance of time, some channels name could disappear, thus there is a kind of "time uniqueness" for channels, contrary to the previous section. This will be detailed later. Also, note that in the rule for replicated input, there is an explicit subtyping in the premises. This is because Γ φ;Φ <sup>−</sup><sup>I</sup> is not time invariant, since the type of a is at least ∀<sup>0</sup> ˜i.iserv<sup>K</sup>(T˜) in this case. However, if this server had both input and output capabilities, we can give a time invariant type for a (or other servers) just by removing the input capability, which can be done by subtyping.

Looking back at Corollary 1, we can for example understand the rule for input by taking the judgement <sup>φ</sup>; <sup>Φ</sup>; <sup>a</sup> : ch3() <sup>a</sup>().tick.<sup>0</sup> 4. This expresses that with an environment providing a message on a within 3 times units, this process terminates in 4 time units.

Finally, we can see that if we remove all size annotations and merge server types and channel types together, we get back the classical input/output types, and all the rules described here are admissible in the classical input/output type system for the π-calculus.

### **4.2 Examples**

**An Example to Justify the Use of Time.** In order to justify the use of time in types for span, and to show how we could find the time of a channel, we present here three examples of recursive calls with different behaviour. We do not detail here a typing derivation, a more detailed example will be described later, in Section 5. Usually, type inference for a size system reduces to satisfying a set of constraints on indices. We believe that even with time indexes on channels, type inference is still reducible to satisfying such a set of constraints. So, for the sake of simplicity, we will describe this example with constraints.

We define three processes P1, P<sup>2</sup> and P<sup>3</sup> by:

$$P\_l \equiv !a(n, r). \mathtt{t.i.ck.match}(n) \left\{ \mathbb{0} \mapsto \overline{r} \langle \rangle; \; \mathtt{s}(m) \mapsto (\nu r') (\nu r'') (Q\_l) \right\}.$$

for the following definition of Qi:

$$\begin{aligned} Q\_1 &\equiv \overline{a} \langle m, r' \rangle \mid \overline{a} \langle m, r'' \rangle \mid r' ( ). r'' ( ). \overline{r} \langle ) \\ Q\_2 &\equiv \overline{a} \langle m, r' \rangle \mid r' ( ). \overline{a} \langle m, r'' \rangle \mid r'' ( ). \overline{r} \langle ) \\ Q\_3 &\equiv \overline{a} \langle m, r' \rangle \mid r' ( ). (\overline{a} \langle m, r'' \rangle \mid \overline{r} \langle )) \mid r'' ( ). 0 \end{aligned}$$

So intuitively, for P<sup>1</sup> the two recursive calls are done after one unit of time in parallel, and the return signal on r is done when both processes have done their return signal on r and r. So, this is total parallelism for the two recursive calls (the span is linear in n). For P2, a first recursive call is done, and then the process waits for the return signal on r , and when it receives it, the second recursive call begins. So, this is totally sequential (the span is exponential in n). Finally, for P<sup>3</sup> we have an intermediate situation between totally parallel and totally sequential. The process starts with a recursive call. Then, it waits for the return signal on r . When this signal arrives, it immediately starts the second recursive call and immediately does the return signal on r. So, intuitively, the second recursive call starts when all the "left" calls have been done. Note that those three servers have the same work, which is exponential in n.

So, let us type the three examples with the type system for span. For the sake of simplicity, we omit the typing of expressions, we only consider the difficult branch for the match constructors, and we focus on complexity and time. We consider the following context that is used for the three processes:

$$T \equiv a: \forall\_0 {}\_{\mathsf{i}}. \mathsf{o} \mathsf{serv}^{f(i)}(\mathsf{Nat}[0, i], \mathsf{ch}\_{g(i)}()), n: \mathsf{Nat}[0, i], r: \mathsf{ch}\_{g(i)})$$

We have two unknown function symbols: f, that represents the complexity of the server, and g, the time for the return channel. We also use this second context:

$$\Delta \equiv \langle \boldsymbol{\Gamma} \rangle\_{-1}, \boldsymbol{m}: \mathsf{Nat}[0, i-1], \boldsymbol{r}': \mathsf{ch}\_{g'(i)}, \boldsymbol{r}'': \mathsf{ch}\_{g''(i)}$$

This gives two more unknown functions, g and g corresponding respectively to the time of r and r when defined. The three processes start with the same typing. We use a double line to express that we do not use a real typing rule, so we can omit some premises or do simultaneously several typing rules.


The first thing to remark is that the typing does a tick typing rule. In this rule for tick, the complexity on the bottom should have the shape K + 1 for some K, so here we obtain immediately that f(i) ≥ 1. In the same way, r should still be defined in Γ−<sup>1</sup>, so by definition of time advance, it means that <sup>g</sup>(i) <sup>≥</sup> 1.

Then, for the three processes, the typing gives the following conditions on the indices, for i ≥ 1. For Q1:

$$f(i) - 1 \ge f(i-1) \qquad g'(i) = g(i-1) \qquad g''(i) = g(i-1)$$

$$g''(i) \ge g'(i) \qquad g(i) - 1 \ge g''(i) \qquad f(i) - 1 \ge g(i) - 1$$

The first constraint is because the total complexity f(i)−1 must be greater than the complexity of the two recursive calls f(i−1). Then, r and r must have a time equal to g(i−1) in order to correspond to the type of a in the outputs am, r and am, r. Finally, as r waits for input after r , the time of r must be greater than the time of r . Similarly, the time of r (which is equal to g(i)−1) must be greater than the time of r, and the total complexity f[i]−1 must be greater than the complexity of r ().r().r which is equal to the time of r. So, we can satisfy the conditions with the following choice:

$$f(i) \equiv i+1 \qquad g(i) \equiv i+1 \qquad g'(i) \equiv g''(i) \equiv i'$$

So, as expected, the span, represented by the function f, is indeed linear.

Then, for Q2, the second call is delayed of g (i) time units because we need to wait for r . Thus, we obtain the following constraints.

$$\begin{aligned} f(i) - 1 &\ge f(i-1) & g'(i) &= g(i-1) & f(i) - 1 &\ge g'(i) + f(i-1) \\ g''(i) - g'(i) &= g(i-1) & g(i) - 1 &\ge g''(i) & f(i) - 1 &\ge g(i) - 1 \end{aligned}$$

This delay of g (i) time units can be seen in the third and fourth constraints. Again, the third constraint is because the complexity should be greater that the complexity of the second call, and the type of r should correspond to the type in a. Thus, we can take

$$f(i) \equiv 2^{i+1} - 1 \qquad g(i) \equiv 2^{i+1} - 1$$

So, we indeed obtain the exponential complexity.

However, with those two examples, the time of the channel r is always equal to the complexity of the server a, so we cannot really see the usefulness of time. Still, with the next example we obtain something more interesting. So, for Q3, this time the fifth constraint on g(i) (depending on when the output to r is done) is different, and we obtain:

$$f(i) - 1 \ge f(i-1) \qquad g'(i) = g(i-1) \qquad f(i) - 1 \ge g'(i) + f(i-1)$$

$$g'(i) - g'(i) = g(i-1) \qquad g(i) - 1 \ge g'(i) \qquad f(i) - 1 \ge g(i) - 1 \qquad f(i) - 1 \ge g''(i)$$

The last constraint is because, again, the complexity should be greater that the complexity of calling r. So, using the equalities, and by removing redundant inequalities, we obtain for f and g:

$$f(i) \ge 1 + g(i-1) + f(i-1) \qquad g(i) \ge 1 + g(i-1) \qquad f(i) \ge 1 + 2 \cdot g(i-1)$$

Thus, we can take:

g-

$$g(i) \equiv i + 1 \qquad f(i) \equiv \frac{(i+1)(i+2)}{2}$$

The complexity is quadratic in n. Note that for this example, the complexity f depends directly on g, and g is given by a recursive equation independent of f. So in a sense, to find the complexity, we need to find first the delay of the second recursive call. Without time indications on channel, it would not be possible to track and obtain this recurrence relation on g and thus we could not deduce the complexity.

Note that the two first examples used channels as a return signal for a parallel computation, whereas for the last example, channels are used as a synchronization point in the middle of a computation. We believe that this flexibility of channels justifies the use of π-calculus to reason about parallel computation. Moreover, this work is a step to a more expressive type system inspired by [27], taking in account concurrent behaviour. Indeed, as we will show, the current type system fails to capture some simple concurrency.

**Limitations of the Type System.** Our current type system enforces some kind of time uniqueness in channels. Indeed, take the process a().tick.a. When trying to type this process, we obtain:

$$\begin{array}{cc} \cdot; \cdot \vdash \mathsf{ch}\_{I}() \sqsubseteq \mathsf{in}\_{I}() \\ \hline \cdot; \cdot ; a: \mathsf{ch}\_{I}() \vdash a: \mathsf{in}\_{I}() \\ \hline \end{array} \quad \begin{array}{c} \begin{array}{c} \text{Error} \\ \hline \cdot; \cdot \langle a: \mathsf{ch}\_{0}() \rangle\_{-1} \vdash \overline{a} \langle \rangle \lhd 0 \\ \hline \cdot; \cdot \langle a: \mathsf{ch}\_{0}() \vdash \mathsf{tick.} \overline{a} \langle \rangle \lhd 1 \end{array} \\ \hline \end{array}$$

As by definition <sup>a</sup> : ch0()−<sup>1</sup> is <sup>∅</sup>, we cannot type the output on <sup>a</sup>. So, channels have strong constraints on the time they can be used. This is true especially when channels are not used linearly. Still, note that we can type a process of the shape a().0 | a | tick.a, so it is better than plain linearity on channels. This restriction limits examples of concurrent behaviours. For example, take two processes P<sup>1</sup> and P<sup>2</sup> that should be executed but not simultaneously. In order to do that in a concurrent setting, we can use semaphores. In π-calculus, we could consider the process (νa)(a().P <sup>1</sup> | a().P <sup>2</sup> | a), where P <sup>1</sup> is P<sup>1</sup> with an output a at the end, likewise for P <sup>2</sup>. This is a way to simulate semaphore in π-calculus. Now, we can see that this example has the same problem as the example given above if for example P<sup>1</sup> contains a tick, thus we cannot type this kind of processes.

Still, we believe that for parallel computation, our type system should be quite expressive in practice. Indeed, as stated above, the restriction appears especially when channels are not used linearly. However, it is known that linear πcalculus in itself is expressive for parallel computation [31]. For example, classical encodings of functional programs in a parallel setting rely on the use of linear return signals, as we will see in the example for bitonic sort in Sect. 5. Moreover, session types can also be encoded in linear π-calculus in the presence of variant types [28,8]. Note that in order to encode a calculus as the one in [9], we would also need recursive types. Our calculus and its proof of soundness could be extended to variant types, but not straightforwardly to recursive types. However, we believe the results on the linear π-calculus we cited suggest that the restriction given above should not be too harmful for parallel computation.

### **4.3 Complexity Results**

In this section, we show how to prove that our type system indeed gives a bound on the number of time reduction steps of a process following the maximal progress assumption. We only give in this section intuitions about the proofs. The detailed proofs can be found in the technical report [5].

In the following section, as we will work with the reduction ⇒p, we need to consider annotated processes instead of simple processes. So, we need to enrich our type system with a rule for the constructor n : P.

$$\frac{\phi; \Phi; \langle \varGamma \rangle\_{-n} \vdash P \lhd K}{\phi; \Phi; \varGamma \vdash n: P \lhd K + n}$$

As the intuition suggested, this rule is equivalent to n times the typing rule for tick. We can now work on the properties of our type system on annotated processes.

The procedure to prove the subject reduction for ⇒<sup>p</sup> in this type system is intrinsically more difficult than the one for Theorem 1. So, from the proof of subject reduction for span, one could deduce the proof of subject reduction for work, just by forgetting the consideration with time and the constructor n : P in the following proof. Thus, in the technical report, only the proof for span is detailed.

Again, we have both weakening and strengthening in this type system. We also have a property specific to size type systems, expressing that an index variable can be substituted by any index. We also need a lemma specific to the notion of time.

**Definition 11 (Delaying).** Given a type T and an index I, we define the delaying of T by I units of time, denoted T+<sup>I</sup> :

$$\mathcal{B}\_{+I} = \mathcal{B} \qquad (\mathsf{ch}\_J(\tilde{T}))\_{+I} = \mathsf{ch}\_{J+I}(\tilde{T})\_{+I}$$

and for other channel and server types, the definition is in correspondence with the one on the right above. This definition can be extended to contexts.

**Lemma 2 (Delaying).** If <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>P</sup> <sup>K</sup> then <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup>+<sup>I</sup> <sup>P</sup> <sup>K</sup> <sup>+</sup> <sup>I</sup>.

With this lemma, we can see that if we add a delay of I time units in the contexts for all channels, it increases the complexity by I time units, thus we see the link between time in types and the complexity. Then, we can show the usual substitution lemma.

### **Lemma 3 (Substitution).**

1. If φ; Φ; Γ, v : T e : U and φ; Φ; Γ e : T then φ; Φ; Γ e [v := e] : U. 2. If <sup>φ</sup>; <sup>Φ</sup>; Γ, <sup>v</sup> : <sup>T</sup> <sup>P</sup> <sup>K</sup> and <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>e</sup> : <sup>T</sup> then <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>P</sup>[<sup>v</sup> := <sup>e</sup>] <sup>K</sup>.

Finally, we can show that typing behaves well with congruence.

**Lemma 4 (Congruence and Typing).** Let P and Q be annotated processes such that <sup>P</sup> <sup>≡</sup> <sup>Q</sup>. Then, <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>P</sup> <sup>K</sup> if and only if <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>Q</sup> <sup>K</sup>.

And with all this, we obtain the subject reduction.

**Theorem 4 (Subject Reduction).** If <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>P</sup> <sup>K</sup> and <sup>P</sup> <sup>⇒</sup><sup>p</sup> <sup>Q</sup> then <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>Q</sup> <sup>K</sup>.

The proof is done by induction on P ⇒<sup>p</sup> Q. The proof can be rather tedious because of subtyping and input/output types that generate a lot of cases for subtyping, and, as expected, the most difficult cases are for communications.

Now that we have the subject reduction for ⇒p, we can easily deduce a more generic form of Theorem 3.

**Theorem 5.** Let P be an annotated process and let m be its global parallel complexity. Then, for a typing <sup>φ</sup>; <sup>Φ</sup>; <sup>Γ</sup> <sup>P</sup> <sup>K</sup>, we have <sup>φ</sup>; <sup>Φ</sup> <sup>K</sup> <sup>≥</sup> <sup>m</sup>.

Corollary 1 is then obtained with the substitution lemma and the rule for parallel composition.

# **5 An Example: Bitonic Sort**

As an example for this type system, we show how to obtain the bound on a common parallel algorithm: bitonic sort [1]. The particularity of this sorting algorithm is that it admits a parallel complexity in <sup>O</sup>(log(n)<sup>2</sup>). We will show here that our type system allows to derive this bound for the algorithm, just as the paper-and-pen analysis. Actually we consider here a version for lists, which is not optimal for the number of operations, but we obtain the usual number of comparisons. For the sake of simplicity, we present here the algorithm for lists of size a power of 2. Let us briefly sketch the ideas of this algorithm. For a formal description see [1].


We will encode this algorithm in π-calculus with a boolean type. As expressed before, our results can easily be extended to support boolean with a conditional constructor.

First, we suppose that a server for comparison lessthan is already implemented. We start with bcompare such that given two lists of same length, it creates the list of maximum and the list of minimum. This is described in Figure 11.

We present here intuitively the typing. To begin with, we suppose that lessthan is given the server type <sup>0</sup>oserv<sup>0</sup>(B, <sup>B</sup>, ch0(Bool)), saying that this is a server ready to be called, and it takes in input a channel that is used to return the boolean value. With this, we can give to bcompare the following server type:

<sup>∀</sup>0i.serv<sup>1</sup>(List[0, i](B), List[0, i](B), out1(List[0, i](B), List[0, i](B)))

```
!bcompare(l1 ,l2 ,a) . match(l1 ) {
  [ ] → al1 ,l2  ; ;
  x :: l
      -

      1 → match(l2 ) {
    [ ] → al1 ,l2  ; ;
    y :: l
        -

        2 → (νb) (νc) (
       bcomparel
                  -

                  1 ,l
                     -

                     2 ,b | tick.lessthanx ,y , c
       | b(lm ,lM ) . c( z ) .if z then ax :: lm ,y :: lM  else ay :: lm ,x :: lM 
    )
  }
}
!bmerge(up, l ,a) . match(l) {
  [ ] → al ; ;
  [y] → al ; ;
    → let (l1 ,l2 ) = partition(l) in (νb) (νc) (νd) (
    bcomparel1 ,l2 ,b | b(p1 ,p2 ).(bmergeup,p1 ,c | bmergeup,p2 ,d)
    | c(q1 ) .d(q2 ) . if up then let l
                                        -
                                         = q1 @ q2 in al
                                                           -

                             else let l
                                        -
                                         = q2 @ q1 in al
                                                            -

  )
}
! bsort(up, l ,a) . match(l) {
  [ ] → al ; ;
  [y]→ al ; ;
    → let (l1 ,l2 ) = partition(l) in (νb) (νc) (νd) (
    bsorttt ,l1 ,b | bsortff ,l2 ,c
    | b(q1 ) . c(q2 ) .let q = q1 @ q2 in bmergeup, q ,d | d(p) .ap
  )
}
```
**Fig. 11.** Bitonic Sort

The important things to notice is that this server has complexity 1, and the channel taken in input has a time 1. In order to verify that this type is correct, we would first need to apply the rule for replicated input. Let us denote by Γ the hypothesis on those two servers names, and Γ be as Γ except that for bcompare we only have the output capability. Then, Γ is indeed time invariant, and we have Γ−<sup>0</sup> <sup>Γ</sup> , so we can continue the typing with this context Γ . Then, we need to show that the process after the replicated input indeed has complexity 1. In the cases of empty list, this can be done easily. In the non-empty case, for the ν constructor, we must give a type to the channels b and c. We use:

$$b: \mathsf{ch}\_1(\mathsf{List}[0, i-1](\mathcal{B}), \mathsf{List}[0, i-1](\mathcal{B})) \qquad c: \mathsf{ch}\_1(\mathsf{Bool})$$

And we can then type the different processes in parallel.

**–** For the call to bcompare, the arguments have the expected type, and this call has complexity 1 because of the type of bcompare.


So, we can indeed give this server type to bcompare, and thus we can call this server and it generates a complexity of 1.

Then, to present the process for bitonic sort, let us use the macro let ˜v = f(˜e) in P to represent (νa)(fe, a ˜ | a(˜v).P), and let us also use a generalized pattern matching. We also assume that we have a function for concatenation of lists and a function partition taking a list of size 2n, and giving two lists corresponding to the first n elements and the last n elements. Then, the process for bitonic sort is given in Figure 11.

Without going into details, the main point in the typing of those relations is to find a solution to a recurrence relation for the complexity of server types. In the typing of bmerge, we suppose given a list of size smaller than 2<sup>i</sup> and we choose both the complexity of this type and the time of the channel a equal to a certain index K (with i free in K). So, it means we chose for bmerge the type:

$$\forall\_0 i. \mathsf{ser} \mathsf{v}^K(\mathsf{Bool}, \mathsf{List}[0, 2^i](\mathcal{B}), \mathsf{out}\_K(\mathsf{List}[0, 2^i](\mathcal{B})))$$

Then, the typing gives us the following condition.

$$i \ge 1 \text{ implies } K \ge 1 + K\{i - 1/i\}$$

Indeed, the two recursive calls to bmerge are done after one unit of time (because the input b(p1, p2) takes one unit of time, as expressed by the type of bcompare), and with a list of size 2<sup>i</sup>−<sup>1</sup>. And then, the continuation after those recursive calls (the process after c(q1).d(q2)) does not generate any complexity. So, we can take K = i, and thus bmerge has logarithmic complexity. Then, in the same way we obtain a recurrence relation for the complexity K of bsort on an input list of size smaller than 2<sup>i</sup> .

$$i \ge 1 \text{ implies } K' \ge K' \{i - 1/i\} + i$$

Again, the two recursive calls are done on lists of size 2<sup>i</sup>−<sup>1</sup>. This time, the delay of i in the recurrence relation is given by the continuation, because of the call to bmerge that generates a complexity of i. Thus, we can take a K in O(i <sup>2</sup>), and we obtain in the end that bitonic sort is indeed in <sup>O</sup>(log(n)<sup>2</sup>) on a list of size <sup>n</sup>.

Remark that in this example, the type system gives recurrence relations corresponding to the usual recurrence relations we would obtain with a complexity analysis by hand. Here, the recurrence relation is only on K because channel names are only used as return channels, so their time is always equal to the complexity of the server that uses them. In general this is not the case as we saw before, so we obtain in general mutually recurrent relations when defining a server.

# **6 Related Work**

An analysis of the complexity of parallel functional programs based on types has been carried out in [23]. Their system can analyse the work and the span (called depth in this paper), and makes use of amortized complexity analysis, which allows to obtain sharp bounds. However, the kind of parallelism they analyse is limited to parallel composition. So on the one hand we are considering a more general model of parallelism, and on the other hand we are not taking advantage of amortized analysis as they do. The paper [17] proposes a complexity analysis of parallel functional programs written in interaction nets, a graph-based language derived from linear logic. Their analysis is based on size types. However, their model is also quite different from ours as interaction nets do not provide namepassing.

Other works like [2] tackle the problem of analysing the parallel complexity of a distributed system by building a distributed flow graph and searching for a path of maximal cost in this graph. Another approach to analyse loops with concurrency in an actor-based language is done by rely-guarantee reasoning [3]. Those approaches give interesting results on some classes of systems, but they cannot be directly applied to the π-calculus language we are considering, with dynamic creation of processes and channels. Moreover, they do not offer the same compositionality as analysis based on type systems. The paper [16] studies distributed systems that are comparable to those of [2], and analyses their complexity by means of a behaviour type system. In a second step the types are used to run an analysis that returns complexity bounds. So this approach is more compositional than that of [2], but still does not apply to our π-calculus language.

Let us now turn to related works in the setting of π-calculus or process calculi. To our knowledge, the first work to study parallel complexity in π-calculus by types was given by Kobayashi [27], as another application of his type system for deadlock freedom, further developed in other papers [30]. In his setting, channels are typed with usages, which are simple CCS-like processes to describe the behaviour of a channel. In order to carry out complexity analysis, those usages are annotated by two time informations, obligation and capability. The obligation level is the time at which a channel is ready to perform an action, and the capability level is the time at which it successfully finds a communication partner. We believe that when they are not infinite, the sum of those levels is related to our own time annotation of channels. The definition of parallel complexity in this work differs from ours, as it loses some non-deterministic paths and the extension with dependent types is suggested but not detailed. It is not clear to us if everything can be adapted to reason only about our parallel complexity, but we plan to study it in future work. More recently Das et al. in [9,10] proposed a type system with temporal session types to capture several parallel cost models with the use of a tick constructor. Our usage of time was inspired by their types with the usual next modality of temporal logic, but in this paper they also use the always and eventually modalities to gain expressivity. We believe that because our usage of time is more permissive, those modalities would not be useful in our calculus. Because of session-types, they have linearity for the use of data-types such as lists, but they obtain deadlock-freedom contrary to our calculus. Moreover, they provide decidable operations to simplify the use of their types, such as subtyping, but they do not define dependent types nor size types that are useful to treat data-types. Still, they provide a significant number of examples to show the expressivity of their type system.

The methodology of our work is inspired by implicit computational complexity, which aims at characterizing complexity classes by means of dedicated programming languages, mainly in the sequential setting, for instance by providing languages for FPTIME functions. Some results have been adapted to the concurrent case, but mainly for the work complexity or for other languages than the π-calculus, e.g. [32,14,7] (the latter reference is for a higher-order πcalculus). The paper [13] is closer to our setting as it defines a notion of causal complexity in π-calculus and gives a type system characterizing processes with polynomial complexity. However, contrarily to those works we do not restrict to a particular complexity class (like FPTIME) and we handle the case of the span.

Technically, the types we use are inspired from linear dependent types [6]. Those are one of the many variants of size types, which were introduced in [26].

# **7 Perspectives**

We see several possible future directions to this work:


**Acknowledgements** We are grateful to Naoki Kobayashi for suggesting the definition of annotated processes and their reduction that we use in this paper.

This work was supported by the LABEX MILYON (ANR-10-LABX-0070) of Universite de Lyon.

# **References**


Languages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22, 2016. pp. 243– 255 (2016)


34. Sangiorgi, D., Walker, D.: The pi-calculus: a Theory of Mobile Processes. Cambridge university press (2003)

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Checking Robustness Between Weak Transactional Consistency Models**

Sidi Mohamed Beillahi(-) , Ahmed Bouajjani, and Constantin Enea

Universit´e de Paris, IRIF, CNRS, Paris, France, {beillahi,abou,cenea}@irif.fr

**Abstract.** Concurrent accesses to databases are typically encapsulated in transactions in order to enable isolation from other concurrent computations and resilience to failures. Modern databases provide transactions with various semantics corresponding to different trade-offs between consistency and availability. Since a weaker consistency model provides better performance, an important issue is investigating the weakest level of consistency needed by a given program (to satisfy its specification). As a way of dealing with this issue, we investigate the problem of checking whether a given program has the same set of behaviors when replacing a consistency model with a weaker one. This property known as robustness generally implies that any specification of the program is preserved when weakening the consistency. We focus on the robustness problem for consistency models which are weaker than standard serializability, namely, causal consistency, prefix consistency, and snapshot isolation. We show that checking robustness between these models is polynomial time reducible to a state reachability problem under serializability. We use this reduction to also derive a pragmatic proof technique based on Lipton's reduction theory that allows to prove programs robust. We have applied our techniques to several challenging applications drawn from the literature of distributed systems and databases.

**Keywords:** Transactional databases · Weak consistency · Program verification

# **1 Introduction**

Concurrent accesses to databases are typically encapsulated in transactions in order to enable isolation from other concurrent computations and resilience to failures. Modern databases provide transactions with various semantics corresponding to different tradeoffs between consistency and availability. The strongest consistency level is achieved with serializable transactions [42] whose outcome in concurrent executions is the same as if the transactions were executed atomically in some order. Since serializability (SER) carries a significant penalty on availability, modern databases often provide weaker consistency models, e.g.,

This work is supported in part by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No 678177).

<sup>©</sup> The Author(s) 2021

N. Yoshida (Ed.): ESOP 2021, LNCS 12648, pp. 87–117, 2021. https://doi.org/10.1007/978-3-030-72019-3 4

causal consistency (CC) [38], prefix consistency (PC) [22, 25], and snapshot isolation (SI) [12]. Causal consistency requires that if a transaction t<sup>1</sup> "affects" another transaction t2, e.g., t<sup>1</sup> executes before t<sup>2</sup> in the same session or t<sup>2</sup> reads a value written by t1, then the updates in these two transactions are observed by any other transaction in this order. Concurrent transactions, which are not causally related to each other, can be observed in different orders, leading to behaviors that are not possible under SER. Prefix consistency requires that there is a total commit order between all the transactions such that each transaction observes all the updates in a prefix of this sequence (PC is stronger than CC). Two transactions can observe the same prefix, which leads to behaviors that are not admitted by SER. Snapshot isolation further requires that two different transactions observe different prefixes if they both write to a common variable.

Since a weaker consistency model provides better performance, an important issue is identifying the weakest level of consistency needed by a program (to satisfy its specification). One way to tackle this issue is checking whether a program P designed under a consistency model S has the same behaviors when run under a weaker consistency model W. This property of a program is generally known as robustness against substituting S with W. It implies that any specification of P is preserved when weakening the consistency model (from S to W). Preserving any specification is convenient since specifications are rarely present in practice.

The problem of checking robustness for a given program has been investigated in several recent works, but only when the stronger model (S) is SER, e.g., [9, 10, 19, 26, 13, 40], or sequential consistency in the non-transactional case, e.g. [36, 15, 29]. However, there is a large class of specifications that can be implemented even in the presence of "anomalies", i.e., behaviors which are not admitted under SER (see [46] for a discussion). In this context, an important question is whether a certain implementation (program) is robust against substituting a weak consistency model, e.g., SI, with a weaker one, e.g., CC.

In this paper, we consider the sequence of increasingly strong consistency models mentioned above, CC, PC, and SI, and investigate the problem of checking robustness for a given program against weakening the consistency model to one in this range. We study the asymptotic complexity of this problem and propose effective techniques for establishing robustness based on abstraction. There are two important cases to consider: robustness against substituting SI with PC and PC with CC, respectively. Robustness against substituting SI with CC can be obtained as the conjunction of these two cases.

In the first case (SI vs PC), checking robustness for a program P is reduced to a reachability (assertion checking) problem in a composition of P under PC with a monitor that checks whether a PC behavior is an "anomaly", i.e., admitted by P under PC, but not under SI. This approach raises two non-trivial challenges: (1) defining a monitor for detecting PC vs SI anomalies that uses a minimal amount of auxiliary memory (to remember past events), and (2) determining the complexity of checking if the composition of P with the monitor reaches a specific control location<sup>1</sup> under the (weaker) model PC. Interestingly enough,

<sup>1</sup> We assume that the monitor goes to an error location when detecting an anomaly.

we address these two challenges by studying the relationship between these two weak consistency models, PC and SI, and serializability. The construction of the monitor is based on the fact that the PC vs SI anomalies can be defined as roughly, the difference between the PC vs SER and SI vs SER anomalies (investigated in previous work [13]), and we show that the reachability problem under PC can be reduced to a reachability problem under SER. These results lead to a polynomial-time reduction of this robustness problem (for arbitrary programs) to a reachability problem under SER, which is important from a practical point of view since the SER semantics (as opposed to the PC or SI semantics) can be encoded easily in existing verification tools (using locks to guard the isolation of transactions). These results also enable a precise characterization of the complexity class of this problem.

Checking robustness against substituting PC with CC is reduced to the problem of checking robustness against substituting SER with CC. The latter has been shown to be polynomial-time reducible to reachability under SER in [10]. This surprising result relies on the reduction from PC reachability to SER reachability mentioned above. This reduction shows that a given program P reaches a certain control location under PC iff a transformed program P , where essentially, each transaction is split in two parts, one part containing all the reads, and one part containing all the writes, reaches the same control location under SER. Since this reduction preserves the structure of the program, CC vs PC anomalies of a program P correspond to CC vs SER anomalies of the transformed program P .

Beyond enabling these reductions, the characterization of classes of anomalies or the reduction from the PC semantics to the SER semantics are also important for a better understanding of these weak consistency models and the differences between them. We believe that these results can find applications beyond robustness checking, e.g., verifying conformance to given specifications.

As a more pragmatic approach for establishing robustness, which avoids a non-reachability proof under SER, we have introduced a proof methodology that builds on Lipton's reduction theory [39] and the concept of commutativity dependency graph introduced in [9], which represents mover type dependencies between the transactions in a program. We give sufficient conditions for robustness in all the cases mentioned above, which characterize the commutativity dependency graph associated to a given program.

We tested the applicability of these verification techniques on a benchmark containing seven challenging applications extracted from previous work [30, 34, 19]. These techniques are precise enough for proving or disproving the robustness of all these applications, for all combinations of the consistency models.

Complete proofs and more details can be found in [11].

# **2 Overview**

We give an overview of the robustness problems investigated in this paper, discussing first the case PC vs. CC, and then SI vs PC. We end with an example that illustrates the robustness checking technique based on commutativity arguments.

Fig. 1: Transactional programs and traces under different consistency models.

**Robustness** PC **vs** CC**.** We illustrate the robustness against substituting PC with CC using the FusionTicket and the Twitter programs in Figure 1a and Figure 1c, respectively. FusionTicket manages tickets for a number of events, each event being associated with a venue. Its state consists of a two-dimensional map that stores the number of tickets for an event in a given venue (r is a local variable, and the assignment in CountTickets is interpreted as a read of the shared state). The program has two processes and each process contains two transactions. The first transaction creates an event e in a venue v with a number of tickets n, and the second transaction computes the total number of tickets for all the events in a venue v. A possible candidate for a specification of this program is that the values computed in CountTickets are monotonically increasing since each such value is computed after creating a new event. Twitter provides a transaction for registering a new user with a given username and password, which is executed by two parallel processes. Its state contains two maps that record whether a given username has been registered (0 and 1 stand for nonregistered and registered, respectively) and the password for a given username. Each transaction first checks whether a given username is free (see the assume statement). The intended specification is that the user must be registered with the given password when the registration transaction succeeds.

A program is robust against substituting PC with CC if its set of behaviors under the two models coincide. We model behaviors of a given program as traces, which record standard control-flow and data-flow dependencies between transactions, e.g., the order between transactions in the same session and whether a transaction reads the value written by another (read-from). The transitive closure of the union of all these dependency relations is called happens-before. Figure 1b pictures a trace of FusionTicket where the concrete values which are read in a transaction are written under comments. In this trace, each process registers a different event but in the same venue and with the same number of tickets, and it ignores the event created by the other process when computing the sum of tickets in the venue.

Figure 1b pictures a trace of FusionTicket under CC, which is a witness that FusionTicket is not robust against substituting PC with CC. This trace is also a violation of the intended specification since the number of tickets is not increasing (the sum of tickets is 3 in both processes). The happens-before dependencies (pictured with HB labeled edges) include the program-order PO (the order between transactions in the same process), and read-write dependencies, since an instance of CountTickets(v) does not observe the value written by the CreateEvent transaction in the other process (the latter overwrites some value that the former reads). This trace is allowed under CC because the transaction CreateEvent(v, e1, 3) executes concurrently with the transaction CountTickets(v) in the other process, and similarly for CreateEvent(v, e2, 3). However, it is not allowed under PC since it is impossible to define a total commit order between CreateEvent(v, e1, 3) and CreateEvent(v, e2, 3) that justifies the reads of both CountTickets(v) transactions (these reads should correspond to the updates in a prefix of this order). For instance, assuming that CreateEvent(v, e1, 3) commits before CreateEvent(v, e2, 3), CountTickets(v) in the second process must observe the effect of CreateEvent(v, e1, 3) as well since it observes the effect of CreateEvent(v, e2, 3). However, this contradicts the fact that CountTickets(v) computes the sum of tickets as being 3.

On the other hand, Twitter is robust against substituting PC with CC. For instance, Figure 1d pictures a trace of Twitter under CC, where the assume in both transactions pass. In this trace, the transactions Register(u,p1) and Register(u,p2) execute concurrently and are unaware of each other's writes (they are not causally related). The HB dependencies include write-write dependencies since both transactions write on the same location (we consider the transaction in Process 2 to be the last one writing to the Password map), and read-write dependencies since each transaction reads RegisteredUsers that is written by the other. This trace is also allowed under PC since the commit order can be defined such that Register(u,p1) is ordered before Register(u,p2), and then both transactions read from the initial state (the empty prefix). Note that this trace has a cyclic happens-before which means that it is not allowed under serializability.

**Checking robustness** PC **vs** CC**.** We reduce the problem of checking robustness against substituting PC with CC to the robustness problem against substituting SER with CC (the latter reduces to a reachability problem under SER [10]). This reduction relies on a syntactic program transformation that rewrites PC behaviors of a given program P to SER behaviors of another program P . The program P is obtained by splitting each transaction t of P into two transactions: the first transaction performs all the reads in t and the second performs all the writes in t (the two are related by program order). Figure 1e shows this transformation applied on Twitter. The trace in Figure 1f is a serializable execution of the transformed Twitter which is "observationally" equivalent to the trace in Figure 1d of the original Twitter, i.e., each read of the shared state returns the same value and the writes on the shared state are applied in the same order (the acyclicity of the happens-before shows that this is a serializable trace). The transformed FusionTicket coincides with the original version because it contains no transaction that both reads and writes on the shared state.

We show that PC behaviors and SER behaviors of the original and transformed program, respectively, are related by a bijection. In particular, we show that any PC vs. CC robustness violation of the original program manifests as a SER vs. CC robustness violation of the transformed program, and vice-versa. For instance, the CC trace of the original Twitter in Figure 1d corresponds to the CC trace of the transformed Twitter in Figure 1f, and the acyclicity of the latter (the fact that it is admitted by SER) implies that the former is admitted by the original Twitter under PC. On the other hand, the trace in Figure 1b is also a CC of the transformed FusionTicket and its cyclicity implies that it is not admitted by FusionTicket under PC, and thus, it represents a robustness violation.

**Robustness** SI **vs** PC**.** We illustrate the robustness against substituting SI with PC using Twitter and the Betting program in Figure 1g. Twitter is not robust against substituting SI with PC, the trace in Figure 1d being a witness violation. This trace is also a violation of the intended specification since one of the users registers a password that is overwritten in a concurrent transaction. This PC trace is not possible under SI because Register(u,p1) and Register(u,p2) observe the same prefix of the commit order (i.e., an empty prefix), but they write to a common memory location Password[u] which is not allowed under SI.

On the other hand, the Betting program in Figure 1g, which manages a set of bets, is robust against substituting SI with PC. The first two processes execute one transaction that places a bet of a value v with a unique bet identifier id, assuming that the bet expiration time is not yet reached (bets are recorded in the map Bets). The third process contains a single transaction that settles the betting assuming that the bet expiration time was reached and at least one bet has been placed. This transaction starts by taking a snapshot of the Bets map into a local variable Bets', and then selects a random non-null value (different from ⊥) in the map to correspond to the winning bet. The intended specification of this program is that the winning bet corresponds to a genuine bet that was placed. Figure 1g pictures a PC trace of Betting where SettleBet observes only the bet of the first process PlaceBet(1,2). The HB dependency towards the second process denotes a read-write dependency (SettleBet reads a cell of the map Bets which is overwritten by the second process). This trace is allowed under SI because no two transactions write to the same location.

**Checking robustness** SI **vs** PC**.** We reduce robustness against substituting PC with CC to a reachability problem under SER. This reduction is based on a characterization of happens-before cycles<sup>2</sup> that are possible under PC but not SI, and the transformation described above that allows to simulate the PC semantics of a program on top of SER. The former is used to define an instrumentation (monitor) for the transformed program that reaches an error state iff the original program is not robust. Therefore, we show that the happens-before cycles in PC traces that are not admitted by SI must contain a transaction that (1) overwrites a value written by another transaction in the cycle and (2) reads a value overwritten by another transaction in the cycle. For instance, the trace of Twitter in Figure 1d is not allowed under SI because Register(u,p2) overwrites a value written by Register(u,p1) (the password) and reads a value overwritten by Register(u,p1) (checking whether the username u is registered). The trace of Betting in Figure 1g is allowed under SI because its happens-before is acyclic.

**Checking robustness using commutativity arguments.** Based on the reductions above, we propose an approximated method for proving robustness based on the concept of mover in Lipton's reduction theory [39]. A transaction is a left (resp., right) mover if it commutes to the left (resp., right) of another transaction (by a different process) while preserving the computation. We use the notion of mover to characterize the data-flow dependencies in the happensbefore. Roughly, there exists a data-flow dependency between two transactions in some execution if one doesn't commute to the left/right of the other one.

We define a commutativity dependency graph which summarizes the happensbefore dependencies in all executions of a transformed program (obtained by splitting the transactions of the original program as explained above), and derive a proof method for robustness which inspects paths in this graph. Two transactions t<sup>1</sup> and t<sup>2</sup> are linked by a directed edge iff t<sup>1</sup> cannot move to the right of t<sup>2</sup> (or t<sup>2</sup> cannot move to the left of t1), or if they are related by the program order. Moreover, two transactions t<sup>1</sup> and t<sup>2</sup> are linked by an undirected edge iff they are the result of splitting the same transaction.

A program is robust against substituting PC with CC if roughly, its commutativity dependency graph does not contain a simple cycle of directed edges with two distinct transactions t<sup>1</sup> and t2, such that t<sup>1</sup> does not commute left because of another transaction t<sup>3</sup> in the cycle that reads a variable that t<sup>1</sup> writes to,

<sup>2</sup> Traces with an acyclic happens-before are not robustness violations because they are admitted under serializability, which implies that they are admitted under the weaker model SI as well.


Fig. 2: The syntax of our programming language. a<sup>∗</sup> indicates zero or more occurrences of a. pid, reg, label, and var represent a process identifier, a register, a label, and a shared variable, respectively. reg-expr is an expression over registers while bexpr is a Boolean expression over registers, or the non-deterministic choice ∗.

and t<sup>2</sup> does not commute right because of another transaction t<sup>4</sup> in the cycle (t<sup>3</sup> and t<sup>4</sup> can coincide) that writes to a variable that t<sup>2</sup> either reads from or writes to<sup>3</sup>. For instance, Figure 1i shows the commutativity dependency graph of the transformed Betting program, which coincides with the original Betting because PlaceBet(1,2) and PlaceBet(2,3) are write-only transactions and SettleBet() is a read-only transaction. Both simple cycles in Figure 1i contain just two transactions and therefore do not meet the criterion above which requires at least 3 transactions. Therefore, Betting is robust against substituting PC with CC.

A program is robust against substituting SI with PC, if roughly, its commutativity dependency graph does not contain a simple cycle with two successive transactions t<sup>1</sup> and t<sup>2</sup> that are linked by an undirected edge, such that t<sup>1</sup> does not commute left because of another transaction t<sup>3</sup> in the cycle that writes to a variable that t<sup>1</sup> writes to, and t<sup>2</sup> does not commute right because of another transaction t<sup>4</sup> in the cycle (t<sup>3</sup> and t<sup>4</sup> can coincide) that writes to a variable that t<sup>2</sup> reads from4. Betting is also robust against substituting SI with PC for the same reason (simple cycles of size 2).

# **3 Consistency Models**

**Syntax.** We present our results in the context of the simple programming language, defined in Figure 2, where a program is a parallel composition of processes distinguished using a set of identifiers P. A process is a sequence of transactions and each transaction is a sequence of labeled instructions. A transaction starts with a begin instruction and finishes with a commit instruction. Instructions include assignments to a process-local register from a set R or to a shared variable from a set V, or an assume. The assignments use values from a data domain

<sup>3</sup> The transactions t1, t2, t3, and t<sup>4</sup> correspond to t1, ti, tn, and t<sup>i</sup>+1, respectively, in Theorem 6.

<sup>4</sup> The transactions t1, t2, t3, and t<sup>4</sup> correspond to t1, t2, tn, and t3, respectively, in Theorem 7.

<sup>D</sup>. An assignment to a register reg := var is called a read of the sharedvariable var and an assignment to a shared variable var := reg is called a write to the shared-variable var. The assume bexpr blocks the process if the Boolean expression bexpr over registers is false. It can be used to model conditionals. The goto statement transfers the control to the program location (instruction) specified by a given label. Since multiple instructions can have the same label, goto statements can be used to mimic imperative constructs like loops and conditionals inside transactions.

We assume w.l.o.g. that every transaction is written as a sequence of reads or assume statements followed by a sequence of writes (a single goto statement from the sequence of read/assume instructions transfers the control to the sequence of writes). In the context of the consistency models we study in this paper, every program can be equivalently rewritten as a set of transactions of this form.

To simplify the technical exposition, programs contain a bounded number of processes and each process executes a bounded number of transactions. A transaction may execute an unbounded number of instructions but these instructions concern a bounded number of variables, which makes it impossible to model SQL (select/update) queries that may access tables with a statically unknown number of rows. Our results can be extended beyond these restrictions as explained in Remark 1 and Remark 2.

**Semantics.** We describe the semantics of a program under four consistency models, i.e., causal consistency<sup>5</sup> (CC), prefix consistency (PC), snapshot isolation (SI), and serializability (SER).

In the semantics of a program under CC, shared variables are replicated across each process, each process maintaining its own local valuation of these variables. During the execution of a transaction in a process, its writes are stored in a transaction log that can be accessed only by the process executing the transaction and that is broadcasted to all the other processes at the end of the transaction. To read a shared variable x, a process p first accesses its transaction log and takes the last written value on x, if any, and then its own valuation of the shared variable, if x was not written during the current transaction. Transaction logs are delivered to every process in an order consistent with the causal relation between transactions, i.e., the transitive closure of the union of the program order (the order in which transactions are executed by a process), and the read-from relation (a transaction t<sup>1</sup> reads-from a transaction t<sup>2</sup> iff t<sup>1</sup> reads a value that was written by t2). When a process receives a transaction log, it immediately applies it on its shared-variable valuation.

In the semantics of a program under PC and SI, shared variables are stored in a central memory and each process keeps a local valuation of these variables. When a process starts a new transaction, it fetches a consistent snapshot of the shared variables from the central memory and stores it in its local valuation of these variables. During the execution of a transaction in a process, writes to shared variables are stored in the local valuation of these variables, and in a transaction log. To read a shared variable, a process takes its own valuation of the

<sup>5</sup> We consider a variation known as causal convergence [20, 16]

shared variable. A process commits a transaction by applying the updates in the transaction log on the central memory in an atomic way (to make them visible to all processes). Under SI, when a process applies the writes in a transaction log on the central memory, it must ensure that there were no concurrent writes that occurred after the last fetch from the central memory to a shared variable that was written during the current transaction. Otherwise, the transaction is aborted and its effects discarded.

In the semantics of a program under SER, we adopt a simple operational model where we keep a single shared-variable valuation in a central memory (accessed by all processes) with the standard interpretation of read and write statements. Transactions execute serially, one after another.

We use a standard model of executions of a program called trace. A trace represents the order between transactions in the same process, and the data-flow in an execution using standard happens-before relations between transactions. We assume that each transaction in a program is identified uniquely using a transaction identifier from a set <sup>T</sup>. Also, <sup>f</sup> : <sup>T</sup> <sup>→</sup> <sup>2</sup><sup>S</sup> is a mapping that associates each transaction in T with a sequence of read and write events from the set

$$\mathbb{S} = \{ \mathbf{re}(t, x, v), \mathbf{we}(t, x, v) : t \in \mathbb{T}, x \in \mathbb{V}, v \in \mathbb{D} \}$$

where re(t, x, v) is a read of x returning v, and we(t, x, v) is a write of v to x.

**Definition 1.** <sup>A</sup> trace is a tuple <sup>τ</sup> = (ρ, <sup>f</sup> ,TO, PO, WR, WW, RW) where <sup>ρ</sup> <sup>⊆</sup> <sup>T</sup> is a set of transaction identifiers, and


For simplicity, for a trace τ = (ρ, f ,TO, PO, WR, WW, RW), we write t ∈ τ instead of t ∈ ρ. We also assume that each trace contains a fictitious transaction that writes the initial values of all shared variables, and which is ordered before any other transaction in program order. Also, <sup>T</sup>rX(P) is the set of traces representing executions of program P under a consistency model X.

For each <sup>X</sup> ∈ {CC, PC, SI, SER}, the set of traces <sup>T</sup>rX(P) can be described using the set of properties in Table 1. A trace τ is possible under causal consistency iff there exist two relations CO a partial order (causal order) and ARB a total order (arbitration order) that includes CO, such that the properties AxCausal, AxArb, and AxRetVal hold [27, 16]. AxCausal guarantees that the program order and the read-from relation are included in the causal order, and AxArb guarantees


**–** there exist a transaction t<sup>0</sup> = MaxARB ({t - ∈ τ | (t - , t) ∈ CO∧∃ we(t - , x, ·) ∈ f (t - )}) and an event we(t0, x, v) = MaxTO(t0)({we(t0, x, ·) ∈ f (t0)}).

Table 1: Declarative definitions of consistency models. For an order relation ≤, a = M ax≤(A) iff a ∈ A ∧ ∀ b ∈ A. b ≤ a.

that the causal order and the store order are included in the arbitration order. AxRetVal guarantees that a read returns the value written by the last write in the last transaction that contains a write to the same variable and that is ordered by CO before the read's transaction. We use AxCC to denote the conjunction of these three properties. A trace τ is possible under prefix consistency iff there exist a causal order CO and an arbitration order ARB such that AxCC holds and the property AxPrefix holds as well [27]. AxPrefix guarantees that every transaction observes a prefix of transactions that are ordered by ARB before it. We use AxPC to denote the conjunction of AxCC and AxPrefix. A trace τ is possible under snapshot isolation iff there exist a causal order CO and an arbitration order ARB such that AxPC holds and the property AxConflict holds [27]. AxConflict guarantees that if two transactions write to the same variable then one of them must observe the other. We use AxSI to denote the conjunction of AxPC and AxConflict. A trace τ is serializable iff there exist a causal order CO and an arbitration order ARB such that the property AxSer holds which implies that the two relations CO and ARB coincide. Note that for any given program <sup>P</sup>, <sup>T</sup>rSER(P) <sup>⊆</sup> <sup>T</sup>rSI(P) <sup>⊆</sup> <sup>T</sup>rPC(P) <sup>⊆</sup> <sup>T</sup>rCC(P). Also, the four consistency models we consider disallow anomalies such as dirty and phantom reads.

For a given trace τ = (ρ, f ,TO, PO, WR, WW, RW), the happens before order is the transitive closure of the union of all the relations in the trace, i.e., HB = (PO <sup>∪</sup> WR <sup>∪</sup> WW <sup>∪</sup> RW)<sup>+</sup>. A classic result states that a trace <sup>τ</sup> is serializable iff HB is acyclic [2, 47]. Note that HB is acyclic implies that WW is a total order between transactions that write to the same variable, and (PO <sup>∪</sup> WR)<sup>+</sup> and (PO <sup>∪</sup> WR <sup>∪</sup> WW)<sup>+</sup> are acyclic.

### **3.1 Robustness**

In this work, we investigate the problem of checking whether a program P under a semantics Y ∈ {PC, SI} produces the same set of traces as under a weaker semantics X ∈ {CC, PC}. When this holds, we say that P is robust against X relative to Y.

**Definition 2.** A program P is called robust against a semantics X ∈ {CC, PC, SI} relative to a semantics Y ∈ {PC, SI, SER} such that Y is stronger than <sup>X</sup> iff <sup>T</sup>rX(P) = <sup>T</sup>rY(P).

If P is not robust against X relative to Y then there must exist a trace <sup>τ</sup> <sup>∈</sup> <sup>T</sup>rX(P) \ <sup>T</sup>rY(P). We say that <sup>τ</sup> is a robustness violation trace.

We illustrate the notion of robustness on the programs in Figure 3, which are commonly used in the literature. In all programs, transactions of the same process are aligned vertically and ordered from top to bottom. Each read instruction is commented with the value it reads in some execution.

The store buffering (SB) program in Figure 3a contains four transactions that are issued by two distinct processes. We emphasize an execution where t<sup>2</sup> reads 0 from y and t<sup>4</sup> reads 0 from x. This execution is allowed under CC since the two writes by t<sup>1</sup> and t<sup>3</sup> are not causally dependent. Thus, t<sup>2</sup> and t<sup>4</sup> are executed without seeing the writes from t<sup>3</sup> and t1, respectively. However, this execution is not feasible under PC (which implies that it is not feasible under both SI and SER). In particular, we cannot have neither (t1, t3) ∈ ARB nor (t3, t1) ∈ ARB which contradicts the fact that ARB is total order. For example, if (t1, t3) ∈ ARB, then (t1, t4) ∈ CO (since ARB; CO ⊂ CO) which contradicts the fact that t<sup>4</sup> does not see t1.

(a) Store Buffering (SB).

$$\mathop{\begin{array}{c} \mathop{\!\!\! } \begin{array}{c} \left[ r1 := x \end{array} \Big| \begin{array}{c} \mathop{\!\!\! \mathbb{W}} \\ x := r1 + 1 \end{array} \Big| \mathop{\!\!\!\! }\_{\mathop{\!\!\!\! \mathbb{W}} \begin{array}{c} \left[ r2 := x \end{array} \Big| \begin{array}{c} \left[ r1 \end{array} \right]\_{t\_2} \end{array} \right. $$

(b) Lost Update (LU).

$$\mathfrak{l}\_{t\_1} \stackrel{[r1:=x \quad//0 \quad \begin{array}{c} \mathbb{R} \mathbf{W} \\ \mathbf{y} :=1 \end{array}]}{\mathbf{y} := 1} \stackrel{\mathbb{R} \mathbf{W}}{\mathbf{y}} \stackrel{[r2:=y \quad//0 \quad \mathbf{y}\_2]}{:=1}$$

(c) Write Skew (WS).

$$\begin{aligned} \iota\_1 \quad & [x:=1] \quad & [r1:=y] \;/ \;/ \;1 \; \iota\_3\\ \lnot\limits\_{\mathsf{PO}} \quad & \Big\bullet \quad & \Big\bullet \quad \\ \iota\_2 \quad & [y:=1] \quad & [r2:=x] \;/ \;/ \;1 \; \iota\_4 \end{aligned}$$

(d) Message Passing (MP).

Fig. 3: Litmus programs

Similarly, (t3, t1) ∈ ARB implies that (t3, t2) ∈ CO which contradicts the fact that t<sup>2</sup> does not see t3. Thus, SB is not robust against CC relative to PC.

The lost update (LU) program in Figure 3b has two transactions that are issued by two distinct processes. We highlight an execution where both transactions read 0 from x. This execution is allowed under PC since both transactions are not causally dependent and can be executed in parallel by the two processes. However, it is not allowed under SI since both transactions write to a common variable (i.e., x). Thus, they cannot be executed in parallel and one of them must see the write of the other. Thus, SB is not robust against PC relative to SI.

The write skew (WS) program in Figure 3c has two transactions that are issued by two distinct processes. We highlight an execution where t<sup>1</sup> reads 0 from x and t<sup>2</sup> reads 0 from y. This execution is allowed under SI since both transactions are not causally dependent, do not write to a common variable, and can be executed in parallel by the two processes. However, this execution is not allowed under SER since one of the two transactions must see the write of the other. Thus, WS is not robust against SI relative to SER.

The message passing (MP) program in Figure 3d has four transactions issued by two processes. Because t<sup>1</sup> and t<sup>2</sup> are causally dependent, under any semantics X ∈ {CC, PC, SI, SER} we only have three possible executions of MP, which correspond to either t<sup>3</sup> and t<sup>4</sup> not observing the writes of t<sup>1</sup> and t2, or t<sup>3</sup> and t<sup>4</sup> observe the writes of both t<sup>1</sup> and t2, or t<sup>4</sup> observes the write of t<sup>1</sup> (we highlight the values read in the second case in Figure 3d). Therefore, the executions of this program under the four consistency models coincide. Thus, MP is robust against CC relative to any other model.

# **4 Robustness Against** CC **Relative to** PC

We show that checking robustness against CC relative to PC can be reduced to checking robustness against CC relative to SER. The crux of this reduction is a program transformation that allows to simulate the PC semantics of a program P using the SER semantics of a program P♣. Checking robustness against CC relative to SER can be reduced in polynomial time to reachability under SER [10].

Given a program P with a set of transactions Tr(P), we define a program P♣ such that every transaction t ∈ Tr(P) is split into a transaction t[r] that contains all the read/assume statements in t (in the same order) and another transaction t[w] that contains all the write statements in t (in the same order). In the following, we establish the following result:

**Theorem 1.** A program P is robust against CC relative to PC iff P♣ is robust against CC relative to SER.

Intuitively, under PC, processes can execute concurrent transactions that fetch the same consistent snapshot of the shared variables from the central memory and subsequently commit their writes. Decoupling the read part of a transaction from the write part allows to simulate such behaviors even under SER.

The proof of this theorem relies on several intermediate results concerning the relationship between traces of P and P♣. Let τ = (ρ, PO, WR, WW, RW) ∈ <sup>T</sup>rX(P) be a trace of a program <sup>P</sup> under a semantics <sup>X</sup>. We define the trace τ♣ = (ρ♣, PO♣, WR♣, WW♣, RW♣) where every transaction t ∈ τ is split into two transactions t[r] ∈ τ♣ and t[w] ∈ τ♣, and the dependency relations are straightforward adaptations, i.e.,


For instance, Figure 4 pictures the trace τ♣ for the LU trace τ given in Figure 3b. For traces τ of programs that contain singleton transactions, e.g., SB in Figure 3a, τ♣ coincides with τ .

Conversely, for a given trace τ♣ = (ρ♣, PO♣, WR♣, WW♣, RW♣) <sup>∈</sup> <sup>T</sup>rX(P♣)

$$\begin{array}{ccccc} t\_1[r] & [r1 = x] & / / 0 & [r2 = x] & / / 0 & t\_2[r] \\ & \text{Po} & \text{fW} & \text{P}\text{O} \\ & t\_1[w] & [x = r1 + 1] & \text{fW} & [x = r2 + 1] & t\_2[w] \end{array}$$

Fig. 4: A trace of the transformed LU program (LU♣).

of a program P♣ under a semantics X, we define the trace τ = (ρ, PO, WR, WW, RW) where every two components t[r] and t[w] are merged into a transaction t ∈ τ . The dependency relations are defined in a straightforward way, e.g., if (t [w], t[w]) ∈ WW♣ then (t , t) ∈ WW.

The following lemma shows that for any semantics X ∈ {CC, PC, SI}, if <sup>τ</sup> <sup>∈</sup> <sup>T</sup>rX(P) for a program <sup>P</sup>, then <sup>τ</sup>♣ is a valid trace of <sup>P</sup>♣ under <sup>X</sup>, i.e., <sup>τ</sup>♣ <sup>∈</sup> <sup>T</sup>rX(P♣). Intuitively, this lemma shows that splitting transactions in a trace and defining dependency relations appropriately cannot introduce cycles in these relations and preserves the validity of the different consistency axioms.

The proof of this lemma relies on constructing a causal order CO♣ and an arbitration order ARB ♣ for the trace τ♣ starting from the analogous relations in τ . In the case of CC, these are the smallest transitive relations such that:

**–** PO♣ ⊆ CO♣ ⊆ ARB♣, and **–** if (t1, t2) ∈ CO then (t1[w], t2[r]) ∈ CO♣, and if (t1, t2) ∈ ARB then (t1[w], t2[r]) ∈ ARB ♣.

For PC and SI, CO♣ must additionally satisfy: if (t1, t2) ∈ ARB, then (t1[w], t2[w]) ∈ CO♣. This is required in order to satisfy the axiom AxPrefix, i.e., ARB♣; CO♣ ⊂ CO♣, when (t1[w], t2[r]) ∈ ARB ♣ and (t2[r], t2[w]) ∈ CO♣.

This construction ensures that CO♣ is a partial order and ARB ♣ is a total order because CO is a partial order and ARB is a total order. Also, based on the above rules, we have that: if (t1[w], t2[r]) ∈ CO♣ then (t1, t2) ∈ CO, and similarly, if (t1[w], t2[r]) ∈ ARB ♣ then (t1, t2) ∈ ARB.

**Lemma 1.** If <sup>τ</sup> <sup>∈</sup> <sup>T</sup>rX(P), then <sup>τ</sup>♣ <sup>∈</sup> <sup>T</sup>rX(P♣).

Before presenting a strengthening of Lemma 1 when X is CC, we give an important characterization of CC traces. This characterization is stated in terms of acyclicity properties.

**Lemma 2.** τ is a trace under CC iff ARB <sup>+</sup> <sup>0</sup> and CO<sup>+</sup> <sup>0</sup> ; RW are acyclic (ARB<sup>0</sup> and CO<sup>0</sup> are defined in Table 1).

Next we show that a trace τ of a program P is CC iff the corresponding trace τ♣ of P♣ is CC as well. This result is based on the observation that cycles in ARB <sup>+</sup> <sup>0</sup> or CO<sup>+</sup> <sup>0</sup> ; RW cannot be broken by splitting transactions.

**Lemma 3.** A trace τ of P is CC iff the corresponding trace τ♣ of P♣ is CC.

The following lemma shows that a trace τ is PC iff the corresponding trace τ♣ is SER. The if direction in the proof is based on constructing a causal order CO and an arbitration order ARB for the trace τ from the arbitration order ARB♣ in τ♣ (since τ♣ is a trace under serializability CO♣ and ARB ♣ coincide). These are the smallest transitive relations such that:


<sup>6</sup> If <sup>t</sup>1[w] is empty (t<sup>1</sup> is read-only), then we set (t1, t2) <sup>∈</sup> ARB if (t1[r], t2[w]) <sup>∈</sup> CO♣. If t2[w] is empty, then (t1, t2) ∈ ARB if (t1[w], t2[r]) ∈ CO♣. If both t1[w] and t2[w] are empty, then (t1, t2) ∈ ARB if (t1[r], t2[r]) ∈ CO♣.

The only-if direction is based on the fact that any cycle in the dependency relations of τ that is admitted under PC (characterized in Lemma 7) is "broken" by splitting transactions. Also, splitting transactions cannot introduce new cycles that do not originate in τ .

# **Lemma 4.** A trace τ is PC iff τ♣ is SER

The lemmas above are used to prove Theorem 1 as follows:

Proof of Theorem 1: For the if direction, assume by contradiction that <sup>P</sup> is not robust against CC relative to PC. Then, there must exist a trace <sup>τ</sup> <sup>∈</sup> <sup>T</sup>rCC(P) \ <sup>T</sup>rPC(P). Lemmas 3 and 4 imply that the corresponding trace <sup>τ</sup>♣ of <sup>P</sup>♣ is CC and not SER. Thus, P♣ is not robust against CC relative to SER. The only-if direction is proved similarly. -

Robustness against CC relative to SER has been shown to be reducible in polynomial time to the reachability problem under SER [10]. Given a program P and a control location , the reachability problem under SER asks whether there exists an execution of P under SER that reaches . Therefore, as a corollary of Theorem 1, we obtain the following:

**Corollary 1.** Checking robustness against CC relative to PC is reducible to the reachability problem under SER in polynomial time.

In the following we discuss the complexity of this problem in the case of finitestate programs (bounded data domain). The upper bound follows from Corollary 1 and standard results about the complexity of the reachability problem under sequential consistency, which extend to SER, with a bounded [35] or parametric number of processes [45]. For the lower bound, given an instance (P, ) of the reachability problem under sequential consistency, we construct a program P where each statement s of P is executed in a different transaction that guards<sup>7</sup> the execution of s using a global lock (the lock can be implemented in our programming language as usual, e.g., using a busy wait loop for locking), and where reaching the location enables the execution of a "gadget" that corresponds to the SB program in Figure 3a. Executing each statement under a global lock ensures that every execution of P under CC is serializable, and faithfully represents an execution of P under sequential consistency. Moreover, P reaches iff P contains a robustness violation, which is due to the SB execution.

**Corollary 2.** Checking robustness of a program with a fixed number of variables and bounded data domain against CC relative to PC is PSPACE-complete when the number of processes is bounded and EXPSPACE-complete, otherwise.

# **5 Robustness Against** PC **Relative to** SI

In this section, we show that checking robustness against PC relative to SI can be reduced in polynomial time to a reachability problem under the SER semantics. We reuse the program transformation from the previous section that allows to simulate PC behaviors on top of SER, and additionally, we provide a characterization of traces that distinguish the PC semantics from SI. We use this

<sup>7</sup> That is, the transaction is of the form [lock; s; unlock]

characterization to define an instrumentation (monitor) that is able to detect if a program under PC admits such traces.

We show that the happens-before cycles in a robustness violation (against PC relative to SI) must contain a WW dependency followed by a RW dependency, and they should not contain two successive RW dependencies. This follows from the fact that every happens-before cycle in a PC trace must contain either two successive RW dependencies, or a WW dependency followed by a RW dependency. Otherwise, the happens-before cycle would imply a cycle in the arbitration order. Then, any trace under PC where all its simple happens-before cycles contain two successive RW dependencies is possible under SI. For instance, the trace of the non-robust LU execution in Figure 3b contains WW dependency followed by a RW dependency and does not contain two successive RW dependencies which is disallowed SI, while the trace of the robust WS execution in Figure 3c contains two successive RW dependencies. As a first step, we prove the following theorem characterizing traces that are allowed under both PC and SI.

**Theorem 2.** A program P is robust against PC relative to SI iff every happensbefore cycle in a trace of P under PC contains two successive RW dependencies.

Before giving the proof of the above theorem, we state several intermediate results that characterize cycles in PC or SI traces. First, we show that every PC trace in which all simple happens-before cycles contain two successive RW is also a SI trace.

**Lemma 5.** If a trace τ is PC and all happens-before cycles in τ contain two successive RW dependencies, then τ is SI.

The proof of Theorem 2 also relies on the following lemma that characterizes happens-before cycles permissible under SI.

**Lemma 6.** [23, 13] If a trace τ is SI, then all its happens-before cycles must contain two successive RW dependencies.

Proof of Theorem 2: For the only-if direction, if <sup>P</sup> is robust against PC relative to SI then every trace τ of P under PC is SI as well. Therefore, by Lemma 6, all cycles in τ contain two successive RW which concludes the proof of this direction. For the reverse, let τ be a trace of P under PC such that all its happens-before cycles contain two successive RW. Then, by Lemma 5, we have that τ is SI. Thus, every trace <sup>τ</sup> of <sup>P</sup> under PC is SI. -

Next, we present an important lemma that characterizes happens before cycles possible under the PC semantics. This is a strengthening of a result in [13] which shows that all happens before cycles under PC must have two successive dependencies in {RW, WW} and at least one RW. We show that the two successive dependencies cannot be RW followed WW, or two successive WW.

**Lemma 7.** If a trace τ is PC then all happens-before cycles in τ must contain either two successive RW dependencies or a WW dependency followed by a RW dependency.

Combining the results of Theorem 2 and Lemmas 4 and 7, we obtain the following characterization of traces which violate robustness against PC relative to SI.

**Theorem 3.** A program P is not robust against PC relative to SI iff there exists a trace <sup>τ</sup>♣ of <sup>P</sup>♣ under SER such that the trace <sup>τ</sup> obtained by merging<sup>8</sup> read and write transactions in τ♣ contains a happens-before cycle that does not contain two successive RW dependencies, and it contains a WW dependency followed by a RW dependency.

The results above enable a reduction from checking robustness against PC relative to SI to a reachability problem under the SER semantics. For a program P, we define an instrumentation denoted by [[P]], such that P is not robust against PC relative to SI iff [[P]] violates an assertion under SER. The instrumentation consists in rewriting every transaction of P as shown in Figure 6.

The instrumentation [[P]] running under SER simulates the PC semantics of P using the same idea of decoupling the execution of the read part of a transaction from the write part. It violates an assertion when it simulates a PC trace containing a happens-

Fig. 5: Execution simulating a violation to robustness against PC relative to SI.

before cycle as in Theorem 3. The execution corresponding to this trace has the shape given in Figure 5, where t# is the transaction that occurs between the WW and the RW dependencies, and every transaction executed after t# (this can be a full transaction in P, or only the read or write part of a transaction in P) is related by a happens-before path to t# (otherwise, the execution of this transaction can be reordered to occur before t#). A transaction in P can have its read part included in α and the write part included in β or γ. Also, β and γ may contain transactions in P that executed only their read part. It is possible that t<sup>0</sup> = t, β = γ = , and α = (the LU program shown in Figure 3b is an example where this can happen). The instrumentation uses auxiliary variables to track happens-before dependencies, which are explained below.

The instrumentation executes (incomplete) transactions without affecting the auxiliary variables (without tracking happens-before dependencies) (lines 3 and 5) until a non-deterministically chosen point in time when it declares the current transaction as the candidate for t# (line 9). Only one candidate for t# can be chosen during the execution. This transaction executes only its reads and it chooses non-deterministically a variable that it could write as a witness for the WW dependency (see lines 16-22). The name of this variable is stored in a global variable varW (see the definition of I#( x := e )). The writes are not applied on the shared memory. Intuitively, t# should be thought as a transaction whose writes are delayed for later, after transaction t in Figure 5 executed. The instrumentation checks that t# and t can be connected by some happens-before path that includes the RW and WW dependencies, and that does not contain two consecutive RW dependencies. If it is the case, it violates an assertion at the commit point of t. Since the write part of t# is intuitively delayed to execute after t, the process executing t# is disabled all along the execution (see the assume false).

<sup>8</sup> This transformation has been defined at the beginning of Section 4.

Transaction "begin read <sup>∗</sup> test <sup>∗</sup> write <sup>∗</sup> commit" is rewritten to:

```
1 if ( !done# )
 2 if (*)
 3 begin <read>∗ <test>∗ commit
 4 if ( !done# )
 5 begin <write>∗ commit
 6 else
 7 I(begin) (I(<write>))∗ I(commit)
 8 else
 9 begin (I#(<read>))∗ <test>∗ (I#(<write>))∗ I#(commit)
10 assume false;
11 else if (*)
12 rdSet' := ∅;
13 wrSet' := ∅;
14 I(begin) (I(<read>))∗ <test>∗ I(commit)
15 I(begin) (I(<write>))∗ I(commit)
                                                               I#( r := x ):
                                                            16 r := x;
                                                            17 hbR['x'] := 0;
                                                            18 rdSet := rdSet ∪ { 'x' };
                                                               I#( x := e ):
                                                            19 if ( varW == ⊥ and * )
                                                            20 varW := 'x';
                                                               I#( commit ):
                                                             21 assume ( varW != ⊥ )
                                                             22 done# := true
I( begin ):
23 begin
24 hb := ⊥
25 if ( hbP != ⊥ and hbP < 2 )
26 hb := 0;
27 else if ( hbP = 2 )
28 hb := 2;
I( commit ):
29 assume ( hb != ⊥ )
30 assert ( hb == 2 or varW ∈ wrSet' );
31 if ( hbP == ⊥ or hbP > hb )
32 hbP = hb;
33 for each 'x' ∈ wrSet'
34 if ( hbW['x'] == ⊥ or hbW['x'] > hb )
35 hbW['x'] = hb;
36 for each 'x' ∈ rdSet'
37 if ( hbR['x'] == ⊥ or hbR['x'] > hb )
38 hbR['x'] = hb;
39 rdSet := rdSet ∪ rdSet';
40 wrSet := wrSet ∪ wrSet';
41 commit
                                                I( r := x ):
                                               42 r := x;
                                               43 rdSet' := rdSet' ∪ { 'x' };
                                               44 if ( 'x' ∈ wrSet )
                                               45 if ( hbW['x'] != 2 )
                                               46 hb := 0
                                               47 else if ( hb == ⊥ )
                                               48 hb := hbW['x']
                                                I( x := e ):
                                               49 x := e;
                                               50 wrSet' := wrSet' ∪ { 'x' };
                                               51 if ( 'x' ∈ wrSet )
                                               52 if ( hbW['x'] != 2 )
                                               53 hb := 0
                                               54 else if ( hb == ⊥ )
                                               55 hb := hbW['x']
                                               56 if ( 'x' ∈ rdSet )
                                               57 if ( hb = ⊥ or hb > hbR['x'] + 1 )
                                               58 hb := min(hbR['x'] + 1,2)
```
Fig. 6: A program instrumentation for checking robustness against PC relative to SI. The auxiliary variables used by the instrumentation are shared variables, except for hbP, rdSet', and wrSet', which are process-local variables, and they are initially set to ⊥. This instrumentation uses program constructs which can be defined as syntactic sugar from the syntax presented in Section 3, e.g., ifthen-else statements (outside transactions).

After choosing the candidate for t#, the instrumentation uses the auxiliary variables for tracking happens-before dependencies. Therefore, rdSet and wrSet record variables read and written, respectively, by transactions that are connected by a happens-before path to t# (in a trace of P). This is ensured by the assume at line 29. During the execution, the variables read or written by a transaction<sup>9</sup> that writes a variable in rdSet (see line 56), or reads or writes a variable in wrSet (see lines 44 and 51), will be added to these sets (see lines 39

 These are stored in the local variables rdSet' and wrSet' while the transaction is running.

and 40). Since the variables that t# writes in P are not recorded in wrSet, these happens-before paths must necessarily start with a RW dependency (from t#). When the assertion fails (line 30), the condition varW ∈ wrSet' ensures that the current transaction has a WW dependency towards the write part of t# (the current transaction plays the role of t in Figure 5).

The rest of the instrumentation checks that there exists a happens-before path from t# to t that does not include two consecutive RW dependencies, called a SI<sup>¬</sup> path. This check is based on the auxiliary variables whose name is prefixed by hb and which take values in the domain {⊥, 0, 1, 2} (⊥ represents the initial value). Therefore,


The local variable hbP has the same interpretation, except that t and t are instantiated over transactions in the same process (that already executed) instead of transactions that read or write a certain variable. Similarly, the variable hb is a particular case where t and t are instantiated to the current transaction. The violation of the assertion at line 30 implies that hb is 0 or 1, which means that there exists a SI<sup>¬</sup> path from t# to t.

During each transaction that executes after t#, the variable hb characterizing happens-before paths that end in this transaction is updated every time a new happens-before dependency is witnessed (using the values of the other variables). For instance, when witnessing a WR dependency (line 44), if there exists a SI path to a transaction that writes to x, then the path that continues with the WR dependency towards the current transaction is also a SI<sup>¬</sup> path, and the last dependency of this path is not RW. Therefore, hb is set to 0 (see line 46). Otherwise, if every path to a transaction that writes to x is not a SI<sup>¬</sup> path, then every path that continues to the current transaction (by taking the WR dependency) remains a non SI<sup>¬</sup> path, and hb is set to the value of hbW['x'], which is 2 in this case (see line 48). Before ending a transaction, the value of hb can be used to modify the hbR, hbW, and hbP variables, but only if those variables contain bigger values (see lines 31–38).

The correctness of the instrumentation is stated in the following theorem.

**Theorem 4.** A program P is robust against PC relative to SI iff the instrumentation in Figure 6 does not violate an assertion when executed under SER.

Theorem 4 implies the following complexity result for finite-state programs. The lower bound is proved similarly to the case CC vs PC.

**Corollary 3.** Checking robustness of a program with a fixed number of variables and bounded data domain against PC relative to SI is PSPACE-complete when the number of processes is bounded and EXPSPACE-complete, otherwise.

Checking robustness against CC relative to SI can be also shown to be reducible (in polynomial time) to a reachability problem under SER by combining the results of checking robustness against CC relative to PC and PC relative to SI.

**Theorem 5.** A program P is robust against CC relative to SI iff P is robust against CC relative to PC and P is robust against PC relative to SI.

Remark 1. Our reductions of robustness checking to reachability apply to an extension of our programming language where the number of processes is unbounded and each process can execute an arbitrary number of times a statically known set of transactions. This holds because the instrumentation in Figure 6 and the one in [10] (for the case CC vs. SER) consist in adding a set of instructions that manipulate a fixed set of process-local or shared variables, which do not store process or transaction identifiers. These reductions extend also to SQL queries that access unbounded size tables. Rows in a table can be interpreted as memory locations (identified by primary keys in unbounded domains, e.g., integers), and SQL queries can be interpreted as instructions that read/write a set of locations in one shot. These possibly unbounded sets of locations can be represented symbolically using the conditions in the SQL queries (e.g., the condition in the WHERE part of a SELECT). The instrumentation in Figure 6 needs to be adapted so that read and write sets are updated by adding sets of locations for a given instruction (represented symbolically as mentioned above).

# **6 Proving Robustness Using Commutativity Dependency Graphs**

We describe an approximated technique for proving robustness, which leverages the concept of left/right mover in Lipton's reduction theory [39]. This technique reasons on the commutativity dependency graph [9] associated to the transformation P♣ of an input program P that allows to simulate the PC semantics under serializability (we use a slight variation of the original definition of this class of graphs). We characterize robustness against CC relative to PC and PC relative to SI in terms of certain properties that (simple) cycles in this graph must satisfy.

We recall the concept of movers and the definition of commutativity dependency graphs. Given a program <sup>P</sup> and a trace <sup>τ</sup> <sup>=</sup> <sup>t</sup><sup>1</sup> · ... · <sup>t</sup><sup>n</sup> <sup>∈</sup> <sup>T</sup>rSER(P) of P under serializability, we say that t<sup>i</sup> ∈ τ moves right (resp., left) in τ if t<sup>1</sup> · ... · t<sup>i</sup>−<sup>1</sup> · t<sup>i</sup>+1 · t<sup>i</sup> · t<sup>i</sup>+2 · ... · t<sup>n</sup> (resp., t<sup>1</sup> · ... · t<sup>i</sup>−<sup>2</sup> · t<sup>i</sup> · t<sup>i</sup>−<sup>1</sup> · t<sup>i</sup>+1 · ... · tn) is also a valid execution of P, t<sup>i</sup> and t<sup>i</sup>+1 (resp., t<sup>i</sup>−<sup>1</sup>) are executed by distinct processes, and both traces reach the same end state. A transaction t ∈ Tr(P) is not a right (resp., left) mover iff there exists a trace <sup>τ</sup> <sup>∈</sup> <sup>T</sup>rSER(P) such that <sup>t</sup> <sup>∈</sup> <sup>τ</sup> and t doesn't move right (resp., left) in τ . Thus, when a transaction t is not a right mover then there must exist another transaction t ∈ τ which caused t to not be permutable to the right (while preserving the end state). Since t and t do not commute, then this must be because of either a write-read, write-write, or a read-write dependency relation between the two transactions. We say that t is not a right mover because of t and a dependency relation that is either write-read, write-write, or read-write. Notice that when t is not a right mover because of t then t is not a left mover because of t.

We define MWR as a binary relation between transactions such that (t, t ) ∈ MWR when t is not a right mover because of t and a write-read dependency (t reads some value written by t). We define the relations MWW and MRW corresponding to write-write and read-write dependencies in a similar way. We call MWR, MWW, and MRW, non-mover relations.

The commutativity dependency graph of a program P is a graph where vertices represent transactions in P. Two vertices are linked by a program order edge if the two transactions are executed by the same process. The other edges in this graph represent the "non-mover" relations MWR, MWW, and MRW. Two vertices that represent the two components t[w] and t[r] of the same transaction t (already linked by PO edge) are also linked by an undirected edge labeled by STO (same-transaction relation).

Our results about the robustness of a program P are stated over a slight variation of the commutativity dependency graph of P♣ (where a transaction is either readonly or write-only). This graph contains additional undirected edges that link every pair of transactions t[r] and t[w] of P♣ that were origi-

Fig. 7: The commutativity dependency graph of the MP♣ program.

nally components of the same transaction t in P. Given such a commutativity dependency graph, the robustness of P is implied by the absence of cycles of specific shapes. These cycles can be seen as an abstraction of potential robustness violations for the respective semantics (see Theorem 6 and Theorem 7). Figure 7 pictures the commutativity dependency graph for the MP program. Since every transaction in MP is singleton, the two programs MP and MP♣ coincide.

Using the characterization of robustness violations against CC relative to SER from [10] and the reduction in Theorem 1, we obtain the following result concerning the robustness against CC relative to PC.

**Theorem 6.** Given a program P, if the commutativity dependency graph of the program P♣ does not contain a simple cycle formed by t<sup>1</sup> ··· t<sup>i</sup> ··· t<sup>n</sup> such that:

$$\begin{array}{c} -\ (t\_n, t\_1) \in \mathsf{M}\_{\mathsf{RM}};\\ -\ (t\_j, t\_{j+1}) \in (\mathsf{PQ} \cup \mathsf{MR})^\*, \,\, for \,\, j \in [1, i-1]; \end{array}$$

$$-\left(\check{t\_i}, \check{t\_{i+1}}\right) \in \left(\check{\mathsf{M}\_{\mathsf{RW}}} \cup \check{\mathsf{M}\_{\mathsf{WW}}}\right);$$

$$1 - (t\_j, t\_{j+1}) \in (\mathbb{M}\_{\text{RW}} \cup \mathbb{M}\_{\text{W}\text{W}} \cup \mathbb{M}\_{\text{W}\text{R}} \cup \text{PO}), \text{ for } j \in [i+1, n-1].$$

then P is robust against CC relative to PC.

Next we give the characterization of commutativity dependency graphs required for proving robustness against PC relative to SI.

**Theorem 7.** Given a program P, if the commutativity dependency graph of the program P♣ does not contain a simple cycle formed by t<sup>1</sup> ··· t<sup>n</sup> such that:

**–** (tn, t1) ∈ MWW, (t1, t2) ∈ STO, and (t2, t3) ∈ MRW; **–** (t<sup>j</sup> , t<sup>j</sup>+1) ∈ (MRW ∪ MWW ∪ MWR ∪ PO ∪ STO)∗, for j ∈ [3, n − 1]; **–** ∀ j ∈ [2, n − 2]. • if (t<sup>j</sup> , t<sup>j</sup>+1) ∈ MRW then (t<sup>j</sup>+1, t<sup>j</sup>+2) ∈ (MWR ∪ PO ∪ MWW); • if (t<sup>j</sup>+1, t<sup>j</sup>+2) ∈ MRW then (t<sup>j</sup> , t<sup>j</sup>+1) ∈ (MWR ∪ PO). **–** ∀ j ∈ [3, n − 3]. if (t<sup>j</sup>+1, t<sup>j</sup>+2) ∈ STO and (t<sup>j</sup>+2, t<sup>j</sup>+3) ∈ MRW then (t<sup>j</sup> , t<sup>j</sup>+1) ∈ MWW.

then P is robust against PC relative to SI.

In Figure 7, we have three simple cycles in the graph:

**–** (t1[w], t4[r]) ∈ MWR and (t4[r], t1[w]) ∈ MRW, **–** (t2[w], t3[r]) ∈ MWR and (t3[r], t2[w]) ∈ MRW, **–** (t1[w], t2[w]) ∈ PO, (t2[w], t3[r]) ∈ MWR, (t3[r], t4[r]) ∈ PO, and (t4[r], t1[w]) ∈ MRW.

Notice that none of the cycles satisfies the properties in Theorems 6 and 7. Therefore, MP is robust against CC relative to PC and against PC relative to SI.

Remark 2. For programs that contain an unbounded number of processes, an unbounded number of instantiations of a fixed number of process "templates", or unbounded loops with bodies that contain entire transactions, a sound robustness check consists in applying Theorem 6 and Theorem 7 to (bounded) programs that contain two copies of each process template, and where each loop is unfolded exactly two times. This holds because the mover relations are "static", they do not depend on the context in which the transactions execute, and each cycle requiring more than two process instances or more than two loop iterations can be short-circuited to a cycle that exists also in the bounded program. Every outgoing edge from a third instance/iteration can also be taken from the second instance/iteration. Two copies/iterations are necessary in order to discover cycles between instances of the same transaction (the cycles in Theorem 6 and Theorem 7 are simple and cannot contain the same transaction twice). These results extend easily to SQL queries as well because the notion of mover is independent of particular classes of programs or instructions.

# **7 Experimental Evaluation**

We evaluated our approach for checking robustness on 7 applications extracted from the literature on databases and distributed systems, and an application Betting designed by ourselves. Two applications were extracted from the OLTP-Bench benchmark [30]: a vote recording application (Vote) and a consumer review application (Epinions). Three applications were obtained from Github projects (used also in [9, 19]): a distributed lock application for the Cassandra database (CassandraLock [24]), an application for recording trade activities (SimpleCurrencyExchange [48]), and a micro social media application (Twitter [49]). The last two applications are a movie ticketing application (Fusion-Ticket) [34], and a user subscription application inspired by the Twitter application (Subscription). Each application consists of a set of SQL transactions that can be called an arbitrary number of times from an arbitrary number of processes. For instance, Subscription provides an AddUser transaction for adding a new user with a given username and password, and a RemoveUser transaction for removing an existing user. (The examples in Figure 1 are particular variations of FusionTicket, Twitter, and Betting.) We considered five variations of the robustness problem: the three robustness problems we studied in this paper along with robustness against SI relative to SER and against CC relative to SER. The artifacts are available in a GitHub repository [31].


Table 2: Results of the experiments. The columns titled X-Y stand for the result of applications robustness against X relative to Y.

In the first part of the experiments, we check for robustness violations in bounded-size executions of a given application. For each application, we have constructed a client program with a fixed number of processes (2) and a fixed number of transactions of the corresponding application (at most 2 transactions per process). For each program and pair of consistency models, we check for robustness violations using the reductions to reachability under SER presented in Section 4 and Section 5 in the case of pairs of weak consistency models, and the reductions in [9, 10] when checking for robustness relative to SER.

We check for reachability (assertion violations) using the Boogie program verifier [8]. We model tables as unbounded maps in Boogie and SQL queries as first-order formulas over these maps (that may contain existential or universal quantifiers). To model the uniqueness of primary keys we use Boogie linear types.

Table 2 reports the results of this experiment (cells filled with "no")<sup>10</sup>. Five applications are not robust against at least one of the semantics relative to some other stronger semantics. The runtimes (wall-clock times) for the robustness checks are all under one second, and the memory consumption is around 50 Megabytes. Concerning scalability, the reductions to reachability presented in Section 4 and Section 5 show that checking robustness is as hard as checking

<sup>10</sup> The Twitter client in Table 2, which is not PC vs CC robust, is different from the one described in Section 2. This client program consists of two processes, each executing FollowUser and AddTweet.

reachability (the size of the instrumented program is only linear in the size of the original program). Therefore, checking robustness will also suffer from the classic state explosion problem when increasing the number of processes. On the other hand, increasing the number of transactions in a process does not seem to introduce a large overhead. Increasing the number of transactions per process in the clients of Epinions, FusionTicket, and SimpleCurrencyExchange from 2 to 5 introduces a running time overhead of at most 25%.

All the robustness violations we report correspond to violations of the intended specifications. For instance: (1) the robustness violation of Epinions against CC relative to PC allows two users to update their ratings for a given product and then when each user queries the overall rating of this product they do not observe the latest rating that was given by the other user, (2) the robustness violation of Subscription against PC relative to SI allows two users to register new accounts with the same identifier, and (3) the robustness violation of Vote against SI relative to SER allows the same user to vote twice. The specification violation in Twitter was reported in [19]. However, it was reported as violation of a different robustness property (CC relative to SER) while our work shows that the violation persists when replacing a weak consistency model (e.g., SI) with a weaker one (e.g. CC). This implies that this specification violation is not present under SI (since it appears in the difference between CC and SI behaviors), which cannot be deduced from previous work.

In the second part of the experiments, we used the technique described in Section 6, based on commutativity dependency graphs, to prove robustness. For each application (set of transactions) we considered a program that for each ordered pair of (possibly identical) transactions in the application, contains two processes executing that pair of transactions. Following Remark 2, the robustness of such a program implies the robustness of a most general client of the application that executes each transaction an arbitrary number of times and from an arbitrary number of processes. We focused on the cases where we could not find robustness violations in the first part. To build the "non-mover" relations MWR, MWW, and MRW for the commutativity dependency graph, we use the left/right mover check provided by the CIVL verifier [33]. The results are reported in Table 2, the cells filled with "yes". We showed that the three applications Betting, CassandraLock and SimpleCurrencyExchange are robust against any semantics relative to some other stronger semantics. As mentioned earlier, all these robustness results are established for arbitrarily large executions and clients with an arbitrary number of processes. For instance, the robustness of SimpleCurrencyExchange ensures that when the exchange market owner observes a trade registered by a user, they observe also all the other trades that were done by this user in the past.

In conclusion, our experiments show that the robustness checking techniques we present are effective in proving or disproving robustness of concrete applications. Moreover, it shows that the robustness property for different combinations of consistency models is a relevant design principle, that can help in choosing the right consistency model for realistic applications, i.e., navigating the tradeoff between consistency and performance (in general, weakening the consistency leads to better performance).

# **8 Related Work**

The consistency models in this paper were studied in several recent works [21, 20, 25, 43, 16, 44, 14]. Most of them focused on their operational and axiomatic formalizations. The formal definitions we use in this paper are based on those given in [25, 16]. Biswas and Enea [14] shows that checking whether an execution is CC is polynomial time while checking whether it is PC or SI is NP-complete.

The robustness problem we study in this paper has been investigated in the context of weak memory models, but only relative to sequential consistency, against Release/Aquire (RA), TSO and Power [36, 17, 15, 29]. Checking robustness against CC and SI relative to SER has been investigated in [9, 10]. In this work, we study the robustness problem between two weak consistency models, which poses different non-trivial challenges. In particular, previous work proposed reductions to reachability under sequential consistency (or SER) that relied on a concept of minimal robustness violations (w.r.t. an operational semantics), which does not apply in our case. The relationship between PC and SER is similar in spirit to the one given by Biswas and Enea [14] in the context of checking whether an execution is PC. However, that relationship was proven in the context of a "weaker" notion of trace (containing only program order and read-from), and it does not extend to our notion of trace. For instance, that result does not imply preserving WW dependencies which is crucial in our case.

Some works describe various over- or under-approximate analyses for checking robustness relative to SER. The works in [13, 18, 19, 26, 40] propose static analysis techniques based on computing an abstraction of the set of computations, which is used for proving robustness. In particular, [19, 40] encode program executions under the weak consistency model using FOL formulas to describe the dependency relations between actions in the executions. These approaches may return false alarms due to the abstractions they consider in their encoding. Note that in this paper, we prove a strengthening of the results of [13] with regard to the shape of happens before cycles allowed under PC.

An alternative to trace-based robustness, is state-based robustness which requires that a program is robust if the sets of reachable states under two semantics coincide. While state-robustness is the necessary and sufficient concept for preserving state-invariants, its verification, which amounts in computing the set of reachable states under the weak semantics models is in general a hard problem. The decidability and the complexity of this problem has been investigated in the context of relaxed memory models such as TSO and Power, and it has been shown that it is either decidable but highly complex (non-primitive recursive), or undecidable [5, 6]. Automatic procedures for approximate reachability/invariant checking have been proposed using either abstractions or bounded analyses, e.g., [7, 4, 28, 1]. Proof methods have also been developed for verifying invariants in the context of weakly consistent models such as [37, 32, 41, 3]. These methods, however, do not provide decision procedures.

# **References**


gramming Languages, POPL 2017, Paris, France, January 18-20, 2017. pp. 458– 472. ACM (2017), http://dl.acm.org/citation.cfm?id=3009895


Science, vol. 8573, pp. 158–170. Springer (2014). https://doi.org/10.1007/978-3- 662-43951-7 14, https://doi.org/10.1007/978-3-662-43951-7 14


China. LIPIcs, vol. 118, pp. 41:1–41:18. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik (2018). https://doi.org/10.4230/LIPIcs.CONCUR.2018.41, https: //doi.org/10.4230/LIPIcs.CONCUR.2018.41


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Verified Software Units**

Lennart Beringer

Princeton University, Princeton NJ 08544, USA eberinge@cs.princeton.edu

**Abstract.** Modularity - the partitioning of software into units of functionality that interact with each other via interfaces - has been the mainstay of software development for half a century. In case of the C language, the main mechanism for modularity is the compilation unit / header file abstraction. This paper complements programmatic modularity for C with modularity idioms for specification and verification in the context of Verifiable C, an expressive separation logic for CompCert Clight. Technical innovations include (i) abstract predicate declarations – existential packages that combine Parkinson & Bierman's abstract predicates with their client-visible reasoning principles; (ii) residual predicates, which help enforcing data abstraction in callback-rich code; and (iii) an application to pure (Smalltalk-style) objects that connects code verification to model-level reasoning about features such as subtyping, self, inheritance, and late binding. We introduce our techniques using concrete example modules that have all been verified using the Coq proof assistant and combine to fully linked verified programs using a novel, abstractionrespecting component composition rule for Verifiable C.

**Keywords:** Verified Software Unit · Abstract Predicate Declaration · Residual Predicate · Positive Subtyping · Verified Software Toolchain.

# **1 Introduction**

Separation logic [61,53] constitutes a powerful framework for verifying functional correctness of imperative programs. Foundational implementations in interactive proof assistants such as Coq exploit the expressiveness of modern type theory to construct semantic models that feature higher-order impredicative quantification, step-indexing, and advanced notions of ghost state [4,36]. On the basis of proof rules that are justified w.r.t. the operational semantics of the programming language in question, these systems perform symbolic execution and employ multiple layers of tactical or computational proof automation to assist the engineer in the construction of concrete verification scripts. Perhaps most importantly, these implementations integrate software verification and model-level validation, by embedding assertions shallowly in the proof assistant's ambient logic; this permits specifications to refer to executable model programs or domain-specific constructions that are then amenable to code-independent analysis in Coq.

To realize the potential of separation logic, such implementations must be provided for mainstream languages and compatible with modern software engineering principles and programming styles. This paper addresses this challenge

c The Author(s) 2021

N. Yoshida (Ed.): ESOP 2021, LNCS 12648, pp. 118–147, 2021. https://doi.org/10.1007/978-3-030-72019-3 5

for Verifiable C, the program logic of the Verified Software Toolchain (VST [4]). We advance Verifiable C's methodology as follows.


This paper is accompanied by a development in Coq [14] that conservatively extends VST with the VSU infrastructure and contains several case studies. In addition to the examples detailed in the paper, the Coq code treats (i) the running example ("piles") of Beringer and Appel's development [15]; we retain their ability to substitute representation-altering but specification-preserving implementations; (ii) a variant of Barnett and Naumann's Master-Clock example [12], as another example of tightly coupled program units; and (iii) an implementation of the Composite design pattern, obtained by transcribing a development from the Verifast code base [35]. In addition, a VSU interface that unifies the APIs of B<sup>+</sup>-trees and tries was recently developed by Kravchuk-Kirilyuk [40].

To see how APDs build on Parkinson and Bierman's work, consider a concrete representation predicate in the style of Reynolds [61]: list x α p specifies that address p represents a monotone list α of numbers greater than x:

list x nil p def = (p=null) & emp list x (a::α) p def = ∃ q. a > x&p → a, q ∗ list a α q. Being defined in terms of →, this definition assumes a specific data layout (a twofield **struct**). Representation-specific predicates enable verification of concrete implementations of operations such as reverse. But a client-facing specification of the entire list module should only expose the predicate in its folded form – a simple case of an abstract predicate. Indeed, while VST fully supports API exposure of **struct**s (incl. stack allocation), all examples in this paper employ an essentially "dataless" programming discipline [8,60,37] in which **struct**s are at most exposed as forward declarations. Clearly, such programmatic encapsulation should not be compromised through the use of concrete predicate definitions.

To regulate whether a predicate is available in its abstract or unfolded form at a particular program point, Parkinson and Bierman employ a notion of scope: predicates are available in their unfolded form when in scope and are treated symbolically elsewhere. This separation can naturally align with the partitioning into compilation units, but is all-or-nothing. But even in the absence of specifications, different clients need different interfaces: C developments routinely provide multiple header files for a single code unit, differing in the amount to which representational information is exposed. Mundane examples include special-purpose interfaces for internal performance monitoring or debugging. Extending this observation to specifications means supporting multiple public invariants. Indeed, several levels of visibility are already conceivable for our simple list predicate:


APDs support such flexibility by combining zero or more abstract predicate declarations (no definitions, to maintain implementation-independence) with axioms that selectively expose the predicates' reasoning principles. In parallel to programmatic forward declarations, an APD is exported in the specification interface of an API and is substantiated – in implementation-dependent fashion – in the VST proof of the corresponding compilation unit. This substantiation includes the validation of the exposed axioms. When specifying the API of a module, the engineer may not only refer to any APDs introduced by the module in question, but may also assume APDs for data structures provided by other modules (whose header files are typically **#include**d in the API in question). Matching the APD assumptions and provisions of different modules occurs naturally during the application of our component linking rule, ensuring that fully linked programs contain no unresolved APD assumptions.

Before going into technical details, we first summarize key aspects of VST.

# **2 Program verification using VST**

Verification using VST happens exclusively inside the Coq proof environment, and operates directly on abstract syntax trees of CompCert Clight. Typically, these ASTs result from feeding a C source file through CompCert's frontend, clightgen, but they may also originate from code synthesis. Either way, verification applies to the same code that is then manipulated by CompCert's optimization and backend phases. This eliminates the assurance gap that emerges when a compiler's (intermediate) representation diverges syntactically or semantically from a verification tool's representation. The absence of such gaps is the gist of VST's machine-checked soundness proof: verified programs are safe w.r.t. the operational semantics of Clight; this guarantee includes memory safety (absence of null-pointer dereferences, out-of-bounds array accesses, use-after-frees,. . . ) but also absence of unintended numeric overflows or race conditions. As Clight code is still legal C code (although slightly simplified, and with evaluation order determinized), verification happens at a level the programmer can easily grasp.

In contrast to other verification tools, VST does not require source code to be annotated with specifications. Instead, the verification engineer writes specifications in a separate Coq file. By not mixing specifications (let alone aspects of proof, such as loop invariants) with source code, VST easily supports associating multiple specifications with a function and constructing multiple proofs for a given code/specification pair.

We write function specifications φ in the form {P} ❀ {v. Q} where v denotes the (sometimes existentially quantified) return value and P and Q are separation logic assertions. To shield details of its semantic model, VST exposes heap assertions using the type **mpred** rather than as direct Coq-level predicates. On top of **mpred**, assertions are essentially embedded shallowly, giving the user access to the logical and programmatic features of Coq when defining specifications.

VST's top-level notion asserting that a (closed) program p – which must include main, with a standard specification – has been verified in Coq is p : G ("**semax prog**"). Here, G – of type **funspecs**, i.e. associating specifications φ to function identifiers f – constitutes a witnessing proof context that contains specifications for all functions in p and must itself be justified: for each (f,φ<sup>f</sup> ) ∈ G, the user must exhibit a Coq proof of G f : φ<sup>f</sup> ("**semax body**"), expressing that f satisfies φ<sup>f</sup> under hypotheses in G. VST's step-indexed model ensures logical consistency in case of (mutual) recursion.

We exploit Beringer and Appel [15]'s theory of specification subsumption φ <: ψ which extends parameter adaptation [38,50,48] to step-indexed separation logics for C and allows a function verified w.r.t φ to be used by clients expecting specification ψ. This theory includes a notion of specification intersection ∧ which – similar to, e.g. the also combinator of the Java Modelling Language (JML, [19])– allows functions to have multiple specifications. Noticeably, subsumption and intersection are related in formally the same manner as intersection types and subtyping are in type theory: in particular, they satisfy the laws <sup>φ</sup><sup>1</sup> <sup>∧</sup>φ<sup>2</sup> <sup>&</sup>lt;: <sup>φ</sup><sup>i</sup> (for <sup>i</sup> ∈ {1, <sup>2</sup>}) and ψ <: <sup>φ</sup><sup>1</sup> ψ <: <sup>φ</sup><sup>2</sup> ψ <: φ<sup>1</sup> ∧ φ<sup>2</sup> (cf. [58], page 206).

# **3 VSU calculus**

As described above, VST verification amounts to exhibiting a G with p : G. In contrast to VST's previous linking regime, VSU ensures existence of G during component linking without actually constructing G, maintaining representation hiding and non-exposure of private functions. Indeed, the modules' specification interfaces (specs of imported and exported functions) suffice for proving that a suitable G exists, as long as each module's individual justification includes the verification of its private functions.

### **3.1 Components and soundness**

VSU extends CompCert's distinction between internal functions (those equipped locally with a function body) and external functions (functions defined in other compilation units, incl. system functions). Given a Clight compilation unit p, we denote these (disjoint) sets by IntFuns(p) and ExtFuns(p), respectively. VSU further distinguishes between system functions (typically provided by the OS) and ordinary external functions: the former ones are not expected to be verified using VST even in a fully linked program, so VSU merely records their use.

VSU's main judgment is <sup>S</sup> <sup>P</sup> [I] p [E], to be read as using specified imports I and system functions S, p provides/ exports functions (with specifications) E, using internal memory satisfying (initially) P. The entities S, I, and E are all **funspecs**, while P specifies the memory holding p's global variables; P's formal type is **globals** → **mpred** where **globals** refers to a map from global identifiers to CompCert values.

The judgment <sup>S</sup> <sup>P</sup> [I] p [E] is formally introduced as an existential abstraction (in Coq: a **Record** type) over a proof context G, which is again of type **funspecs**:

$$\vdash\_{P}^{\mathcal{S}} \left[ \mathcal{T} \right] p \left[ \mathcal{E} \right] \stackrel{\text{def}}{=} \exists G. \; G \vdash\_{P}^{\mathcal{S}} \left[ \mathcal{T} \right] p \left[ \mathcal{E} \right].$$

The role of G is to serve as the witness justifying the specification interface; as such it associates specifications also to p's private functions; existentially hiding it shields implementation details.

The formation of the lower-level judgment G <sup>S</sup> <sup>P</sup> [I] p [E] is subject to the following constraints:

**Definition 1.** Proof context G justifies a component (specification) for Clight compilation unit p with respect to system calls S, imports I, exports E, and predicate P, notation G <sup>S</sup> <sup>P</sup> [I] p [E], if


The first three clauses are largely administrative; they express, respectively, that (1) system functions and imported functions are disjoint sets of external functions, (2) G contains specifications for exactly the system functions and the internal functions, and (3) all exported specifications are abstractions of entries in G, in the sense of specification subsumption <:.

Clause (4) constitutes the main proof obligation and refers to a slight refactoring of VST's function-verification judgment G<sup>1</sup> func funs : G<sup>2</sup> (**semax-func**), where funs associates CompCert function definitions with identifiers. The instantiation I ∪ G func funs<sup>p</sup> : G hence requires that imports I suffice for justifying all entries in G: each system function specification in G must be valid, and each specification of an internal function must be justified by a VST proof the corresponding function body in funs; calls to internal and system functions inside the body are resolved by reference to G, and calls to external functions are resolved by the import specifications, I.

Finally, clause (5) requires p's global variables to collectively satisfy P (after initialization) but avoids referring to these variables by name.

We point out two further aspects of Definition 1. First, we note that system functions may be exported (we do not require dom S ∩ dom E = ∅), and that imports and exports are distinct (dom I ∩ dom E = ∅ follows). Second, we note that for I = ∅, clause (4) yields G func funs<sup>p</sup> : G, i.e. the heart of VST's soundness condition **semax prog** for programs comprised of a single compilation unit. Hence, the goal of VSU verification is to exhaustively apply VSU's combination rule (presented in the next subsection) until all imports have been resolved.

Once a component has been verified and is exposed as <sup>S</sup> <sup>P</sup> [I] p [E], the specifications of p's private functions are hidden inside the existentially quantified context G and hence inaccessible.

### **3.2 Derived rules**

It is easy to derive a rule of consequence from Definition 1 that strengthens imports and relaxes exports:

$$\begin{array}{cccc} \mathcal{T}' \curvearrowleft \mathcal{T} & \vdash\_{P}^{\mathcal{S}} \left[ \mathcal{T} \right] p \left[ \mathcal{E} \right] & \mathcal{E} \sqsubseteq \mathcal{E}' & \forall g, P \, g \vdash P'g\\ \hline & \vdash\_{P'}^{\mathcal{S}} \left[ \mathcal{T}' \right] p \left[ \mathcal{E}' \right] & & & \end{array} \text{VSUConsSEQ}$$

For imported functions, we require pointwise subsumption, by defining I <: I to hold if dom I = dom I and I (i) <: I(i) for all i ∈ dom I. On the export side, we allow hiding of entries, by defining EE to hold if dom E ⊆ dom E and E(i) <: E (i) for all i ∈ dom E . The calculus is invariant in the specifications of system functions, but allows weakening of the initialization predicate. The derivation of this rule instantiates the context witnessing the concluding judgment by the (abstract) witness obtained from unfolding the hypothetical judgment.

VSU's workhorse is the composition rule, VSULink, shown in Figure 1. The side conditions treat the components symmetrically and are motivated as follows. The rule constructs a component specification for a linked program p that retains the internal functions of p<sup>1</sup> and p2, and also any unresolved external functions, as

$$(a) \qquad \vdash\_{P\_1}^{\mathcal{S}\_1} \left[ \mathcal{Z}\_1 \right] p\_1 \left[ \mathcal{E}\_1 \right] \qquad \vdash\_{P\_2}^{\mathcal{S}\_2} \left[ \mathcal{Z}\_2 \right] p\_2 \left[ \mathcal{E}\_2 \right]$$

(b) ∀i ∈ IntFuns(p1) ∪ (ExtFuns(p1) \ IntFuns(p2)), p(i) = p1(i) ∀i ∈ IntFuns(p2) ∪ (ExtFuns(p2) \ IntFuns(p1)), p(i) = p2(i) dom p = dom p<sup>1</sup> ∪ dom p<sup>2</sup>

$$\begin{array}{l} \forall i \in (IntFuns(p\_1) \cap IntFuns(p\_2)) \cup (ExtFuns(p\_1) \cap ExtFuns(p\_2)), \, p\_1(i) = p\_2(i) \\ \forall i \in IntFuns(p\_1) \cap ExtFuns(p\_2), \, sig(p\_1(i)) = sig(p\_2(i)) \land i \in dom \, \mathcal{Z}\_2 \\ \forall i \in IntFuns(p\_2) \cap ExtFuns(p\_1), \, sig(p\_2(i)) = sig(p\_1(i)) \land i \in dom \, \mathcal{Z}\_1 \end{array}$$

$$(d)\qquad dom\,\mathcal{S}\_1 \cap IntFuns(p\_2) = \emptyset \qquad dom\,\mathcal{S}\_2 \cap IntFuns(p\_1) = \emptyset$$

$$\begin{array}{ll}(e) & \forall i \in \operatorname{dom} \mathcal{Z}\_2 \cap \left( \operatorname{dom} \mathcal{S}\_1 \cup \operatorname{Int} \operatorname{Funs}(p\_1) \right), i \in \operatorname{dom} \mathcal{E}\_1 \land \mathcal{E}\_1(i) <: \mathcal{Z}\_2(i) \\ & \forall i \in \operatorname{dom} \mathcal{Z}\_1 \cap \left( \operatorname{dom} \mathcal{S}\_2 \cup \operatorname{Int} \operatorname{Funs}(p\_2) \right), i \in \operatorname{dom} \mathcal{E}\_2 \land \mathcal{E}\_2(i) <: \mathcal{Z}\_1(i) \end{array}$$

$$(f) \qquad \forall i \in \operatorname{dom} \mathcal{Z}\_1 \cap \operatorname{dom} \mathcal{Z}\_2, \mathcal{Z}\_1(i) = \mathcal{Z}\_2(i)$$

$$\mathcal{I}\_1(g) \qquad \mathcal{I} = \mathcal{I}\_1 \bigvee \left( \operatorname{dom} \mathcal{S}\_2 \cup \operatorname{Int} \operatorname{Funs}(p\_2) \right) \cup \mathcal{I}\_2 \bigvee \left( \operatorname{dom} \mathcal{S}\_1 \cup \operatorname{Int} \operatorname{Funs}(p\_1) \right)$$

$$\frac{\mathsf{Factdefsfs}(p\_1) \cap \mathsf{Verdefsfs}(p\_2) = \emptyset \quad \mathsf{Verdefsfs}(p\_1) \cup \mathsf{Verdefsfs}(p\_2) = \mathsf{Verdefsfs}(p)}{\vdash\_{P\_1 \star P\_2}^{S\_1 \wedge \mathsf{N}\_2} [\mathbb{Z}] \, p \, [\mathcal{E}\_1 \, \mathsf{W} \, \mathcal{E}\_2]}$$

**Fig. 1.** VSU's rule of component composition, VSULink.

detailed in side conditions (b). Condition (c) requires functions classified identically by p<sup>1</sup> and p<sup>2</sup> to have identical definitions, and requires differently classified functions to have identical type signatures and be in the import set of the compilation unit not providing the implementation. Condition (d) formalizes that system functions are not locally defined in either unit. Condition (e) expresses that a function imported by one module and programmatically provided by the other module must be exported by the provider; this condition ensures that the export contract cannot be bypassed. Condition (f) expresses that functions imported by both units must be imported identically - if necessary, this can be achieved using the consequence rule. Condition (g) calculates the remaining import specifications by combining the constituent imports, removing entries for the resolved functions, and ensuring the absence of duplicates. The final condition, (h), mandates that global variables from p<sup>1</sup> and p<sup>2</sup> be distinct (hence initialization predicates have disjoint footprints) and propagated to p.

The most interesting aspect of the rule is the duplicate use of the intersection operator, <sup>C</sup><sup>1</sup> <sup>C</sup>2, for constructing the concluding specifications of exported functions and system functions. The general definition of this operator is

$$C\_1 \nparallel C\_2 := \lambda i. \begin{cases} C\_1(i) \land C\_2(i) \text{ if } i \in \operatorname{dom} C\_1 \cap \operatorname{dom} C\_2 \\ C\_1(i) & \text{if } i \in \operatorname{dom} C\_1 \nmid \operatorname{dom} C\_2 \\ C\_2(i) & \text{if } i \in \operatorname{dom} C\_2 \nmid \operatorname{dom} C\_1 \end{cases}$$

where ∧ denotes the specification intersection operator mentioned in Section 2. Thus, exporting <sup>E</sup><sup>1</sup> <sup>E</sup><sup>2</sup> effectively exports both <sup>E</sup><sup>1</sup> and <sup>E</sup>2, and similarly for <sup>S</sup><sup>1</sup> <sup>S</sup>2. Indeed, the individual export specifications can be reestablished using the consequence rule, as the properties of intersection specifications mentioned in Section <sup>2</sup> lift to (export specification) contexts: we have <sup>C</sup><sup>1</sup> <sup>C</sup><sup>2</sup> <sup>C</sup><sup>i</sup> for <sup>i</sup> ∈ {1, <sup>2</sup>} and <sup>X</sup> <sup>C</sup><sup>1</sup> <sup>X</sup> <sup>C</sup><sup>2</sup> for any X.

<sup>X</sup> <sup>C</sup><sup>1</sup> <sup>C</sup><sup>2</sup> By permitting functions f that are internal to both p<sup>1</sup> and p2, VSU supports diamond-shaped composition patterns in which a sub-component, e.g. a library, is imported multiple times. Conditions (b) and (c) ensure that all copies of a repeatedly imported function f have the same body (i.e. CompCert AST), and that this body is retained in p. However, the library's export specification may have been imported differently by the different units, hence G<sup>1</sup> and G<sup>2</sup> may well associate different (and formally incompatible) specifications with f. As G<sup>1</sup> and G<sup>2</sup> are existentially hidden, we cannot inspect these specifications: adding a side condition to the rule that mentions the specifications G1(f) and G2(f) would violate the abstraction principle. Nevertheless, the proof of the composition rule still requires us to attach some specification to the shared function, when constructing the witnessing context of the concluding judgment, G. Our solution is to use intersection , i.e. to instantiate the witness <sup>G</sup> with <sup>G</sup><sup>1</sup> <sup>G</sup><sup>2</sup> in the Coq proof of VSULink. By terminating the Coq proof script with **Qed** rather than **Defined**, this instantiation is opaque to clients: applications of VSULink during program verification merely see that some G exists.

Most side conditions of the rule are computational; in our applications of the rule in Sections 4.5 and 5, Coq's tactical engine solves the majority of them.

# **4 APDs and specification interfaces**

We now turn to the organization of predicates and function specifications. Our organization reflects typical realizations of abstraction principles in C, where heap data structures are introduced using forward declarations and referred to via pointers in header files, while the selection of a concrete representation (perhaps using private static variables) is private to an implementation. We illus-


**Fig. 2.** Connection pools in C: Connection.h (left) and Connectionpool.h (right).

trate our approach using Parkinson and Bierman's connection pool example [56],

ported to C as an implementation of the APIs in Figure 2. Using forward declarations, the header files reveal only minimal information about the implementation. Connection.h allows clients to create a database entity (the parameter denotes a unique identifier; Parkinson and Bierman omit this constructor and do not model the type database explicitly) and to create connections to a database using the constructor consConn. Connectionpool.h models a collection of (dormant) connections associated with a database; clients construct a pool using consPool, request connections using getConn, and return them using freeConn.

# **4.1 Abstract predicate declarations (APDs)**

Figure 3 introduces abstract predicate declarations for the three data structures. Each APD declares zero or more spatial predicates, i.e. **mpred**s relating a Comp-Cert (pointer) value to suitable semantic information. Semantic information for the database is a DBindex (effectively a mathematical integer); connection and pool structures maintain pointers to the database; connections have additional internal state represented by the (abstract) type ConnTP.


**Fig. 3.** APDs for the connection pool example. val is CompCert's type of values.

Specifically, DatabaseAPD corresponds to the Database type declaration in Connection.h and asserts existence of a predicate DB, together with an axiom that enables clients to store a reference to a database in their own data structure. Operator !! injects a Coq proposition into VST's assertion language.

In similar style, ConnectionAPD and PoolAPD declare predicates Conn and CPool for the **struct** declarations connection and pool. In contrast to Parkinson and Bierman, we model that the connection module maintains state using the predicate NextConn. There is no need to reveal the concrete static variable used by our implementation though: **globals** denotes the collection of all such variables in VST. We assert that the head values of Conn and CPool are provably nonnull pointers and that a Conn's head pointer is furthermore valid.

All APDs are introduced as (dependent) **Record** types in Coq. We will construct values of these types in Section 4.3, i.e. implementation-dependent concrete predicate definitions and lemmas validating the axioms. But first, we use the APD types abstractly to introduce specifications for the two modules.

### **4.2 Abstract specification interfaces (ASIs)**

Abstract specification interfaces (ASIs) consist of VST specifications for the APIexposed functions, parametric in all relevant APDs. In addition to the APDs introduced above, our example uses a third APD, denoted M, that declares an abstract predicate MemM gv and represents the malloc/free library.

Figure 4 shows the ASI of Connection.h. We use subscripts to refer to the APD parameters: for example, DBD i p is the **mpred** obtained by applying the DB component of a database APD D to index i and pointer value p.


**Fig. 4.** ASI for Connection.h, parametric in databases (D), connections (C), and memory systems (M). MemM gv represents M's abstract predicate for a memory manager that is accessed by malloc and free.

A specification F(x; gv) : {Pre} ❀ {v. Post} is to be understood in safetyguaranteeing partial-correctness style, where x denotes a list of actual arguments (of type val), gv refers to (if present) the global environment, v (again of type val) represents the return value (if present), and other items are implicitly universally quantified. Callers of such a function select instantiations for the universally quantified entities ("witnesses") and must then establish Pre.

Thus, the specification of newDB asserts that a new database entity satisfying DBD i p is allocated at the return value p, for the database with index i (an input argument). The allocation draws upon the abstract predicate MemM gv which is "located" at some global variable that is private to the malloc/free library.

The specification of constructor consConn refers to MemM gv in similar fashion and advances the module's connection counter from c to some c upon success; in contrast to Parkinson and Bierman, we also support unsuccessful requests.

The ASI for Connectionpool.h in Figure 5 is additionally parametric in an PoolAPD, P. Our specifications are again slightly more precise than the ones given by Parkinson and Bierman. As a consequence, the precondition of a sequence such as p := consPool(s); c := getConn(s); freeConn(p, c) is DBD i d ∗ MemM gv ∗ NextConnC s gv rather then emp, hence exposing the reliance on the memory manager etc..Prefixing the instruction d := newDB(i) establishes DBD i d; we will explain how the latter two conjuncts are provided in Section 4.5.

### **4.3 Verification of ASI-specified compilation units**

Substantiating the ASI of a header file, means to give – for a concrete implementation – concrete definitions for the predicates in the newly introduced APDs,


**Fig. 5.** The ASI for Connectionpool.h is parametric in a database APD (D), a connection APD (C), a connection pool APD (P), and a memory manager APD (M). As consPool takes a formal parameter d, the reader may have expected the specification {DBD i d ∗ MemM gv} ❀ {p. CPoolP d p ∗ DBD i d ∗ MemM gv} which is indeed derivable from the one given using VST's frame rule.

show that these definitions validate the associated axioms, and finally construct a VSU that has the ASI's specifications as the export interface E. All these constructions are parametric in the APDs provided by other modules.

We refer the reader to our source code [14] for the C implementation, the (concrete) predicate definitions, and the proofs of the APD-supporting axioms. In case of Connection.c, these proofs reveal the instantiation of the APD's ConnTP to Coq's type of integers, Z, corresponding to the existence of a global integer variable in the C code that maintains a connection counter; the corresponding . . −→. . predicate then furnishes the abstract predicate NextConn.

The substantiations of a unit's APDs are subsequently used to instantiate its ASI and the specifications of its imported function, yielding (together with specifications of private functions) a proof context G that the unit's local function bodies are then verified against. APDs provided by other compilation units are left abstract, so expose only their axioms. Specifically, the substantiation for Connection.c yields values c and d of types ConnectionAPD and DatabaseAPD, respectively, the predicate N = NextConn c 0, and a VSU

$$\mathsf{VSSU}\_{\mathsf{Conn}} \stackrel{\text{def}}{=} \vdash\_{N}^{\emptyset} [\mathcal{L}\_{\mathsf{Conn}}] \; \textit{Connection}. \mathsf{prog} \left[ \mathcal{E}\_{\mathsf{Conn}} \right].$$

where EConn is the partial specialization of the specifications in Figure 4 to C = c and D = d, Connection.prog is CompCert's AST for Connection.c, and IConn contains a specification for surelymalloc. For ConnectionPool.c, we similarly obtain a value p of type ConnectionpoolAPD and a VSU

$$\mathsf{VSSU}\_{\mathsf{Poo1}} \stackrel{\text{def}}{=} \vdash\_{\mathsf{amp}}^{\emptyset} [\mathcal{Z}\_{\mathsf{Poo1}}] \; Convention pool.prog \, [\mathcal{E}\_{\mathsf{Poo1}}],$$

where IPool is comprised of the (abstract) specification of consConn and specifications for free and surelymalloc, and EPool is the partial specialization of Figure 5 to P = p. Both VSUs are parametric in M, but VSUPool's additional parameters D and C are instantiated when VSUConn and VSUPool are combined using rule VSULink. The result, VSUCP, is still parametric in M but has resolved the imports of consConn, leaving only imports for free and surelymalloc.

### **4.4 A VSU for a malloc-free library**

A recent application of VST is Appel and Naumann's verification of a malloc/free library [5]. Internally maintaining a fixed number of freelists – for entities of different size – this library exposes four functions in its API: malloc, free, pre-fill, try-pre-fill. When porting this development to the VSU framework, these give rise to two ASIs. The first one contains specifications for all four functions and is suitable for resource-aware clients. It employs the APD MallocFree-R-APD:

**Record** MallocTokenAPD := { malloc-token': share → Z → val → mpred; malloc-token'-valid-pointer: ∀ sh sz p, malloc-token' sh sz p valid-pointer p; malloc-token'-facts: ∀ sh sz p, malloc-token' sh sz p !! malloc-compatible sz p }. **Record** MallocFree-R-APD :=

{ MF-Tok-R :> MallocTokenAPD; mem-mgr-R: resvec → globals → mpred }.

mem-mgr-R models the freelists as a resource vector that indicates the length of each freelist. The predicate malloc token' refers to the piece of memory that is typically located at a small negative offset of a malloc'ed entity and holds administrative information of the library, but conceptually, it also constitutes a token that enables clients to share malloc'ed entities among different threads without loosing the ability to safely free entities. The second ASI only exposes malloc and free, and employs the more abstract APD

**Record** MallocFreeAPD :=

{ MF-Tok :> MallocTokenAPD; mem-mgr: globals → mpred }.

MF-Tok still presents a malloc token but mem-mgr now hides the existence of freelists - indeed, constructing a MallocFreeAPD from a MallocFree-R-APD simply quantifies existentially over a resource vector. Our proofs first refactor the prior verification as a VSU that exports a resource-aware ASI and then use VSUConseq (and export restriction from Section 3.2) to weaken the resulting VSU to a VSU that only exports a resource-ignorant ASI. We denote the latter as VSUMF; the predicate MemM gv is now revealed to be a shorthand for mem-mgr gv, parametric in a MallocFreeAPD M, and we use MemMF gv below to refer to its instantiation for VSUMF.

### **4.5 Putting it all together**

Using VSULink again, we link VSUCP with a library VSU (reducing surelymalloc to malloc and the system function exit) and then with VSUMF, obtaining

$$\mathsf{VSSU}\_{\mathsf{kFPLib}} \stackrel{\text{def}}{=} \vdash\_{\mathsf{Mern}\_{\mathsf{W}}}^{\mathcal{S}\_{\mathsf{Corr}}} \; \_{\*N} \; [ \; ] \; coreprog \; [\mathcal{E}\_{\mathsf{Corr}}] .$$

Here, coreprog contains all code (application plus library) with the exception of main. Note that VSUAppLib's set of imports is empty; SCore contains axiomatic specifications of OS functions such as exit and mmap.

Independent from the construction of VSUAppLib we verify main, i.e. an exemplary client or unit test, as a **semax body** statement w.r.t. a not yet instantiated copy of ECore. The specification that main is verified against a <: specialization of VST's general main-spec but is still abstract in the APDs of the application's code modules – see [14] for details.

Finally, we connect VSUAppLib with the verification of main to obtain a proof of VST's **semax prog** statement. It is in this last proof that the satisfaction of the abstract initialization predicates for the global variables, MemMF and N, is established from VST's internal initialization predicates.

# **5 Modular verification of the Subject/Observer pattern**

Programs in imperative or object-oriented languages often contain callbacks: chains of function calls A.m() → B.n() → A.l() between modules A and B in which m's invocation of n (and hence the return of control to A in the call to l) happens when A's state is invalid, i.e. does not satisfy A's invariant. Clearly, mandating satisfaction of the invariant in l's precondition – a typical requirement of API-level specifications – then prevents the verification of n.

A typical example is the chain update → notify → get in the subject-observer pattern, a widely used design pattern [23] that has served as a litmus test for modular specification of callback-rich programming in the literature. Figures 6 and 7 contain excerpts of a transcription of Parkinson's [55] code into<sup>1</sup> C. Each Subject maintains a list of subscribers – a list of observers that will be notified whenever the Subject's state is updated and then synchronize their internal state accordingly using get. The intended invariants express that each Subject's observers are in sync – a property that is violated during update's traversal of its observer list, when not-yet-notified observers are out of sync but (precisely in order to get back in sync) nevertheless invoke get.

The dominant technique for dealing with such situations in SMT-based tools employs ghost fields that track validity and unfolding of invariants and are supported by further (ghost) infrastructure that controls ownership (see e.g. [47,11]). However, this does not necessarily achieve comprehensive representation hiding: for example, the permission to violate Subject's invariant in get's precondition propagates to the precondition of notify, allowing the latter function to access the field<sup>2</sup> Subject.value. Furthermore, the invariant-regulating techniques typically require that SMT solving be carried out on a whole-program basis.

The flexibility of APDs to introduce multiple predicates enables an alternative in which callbacks are specified using special-purpose predicates that – similar to typestates [62] – emphasize protocol-style behavior, do not reveal the

<sup>1</sup> Our implementation [14] contains two further callbacks, newObs <sup>→</sup> registr <sup>→</sup> notify and registr → notify → get; the former one commences in the constructor, before any invariant has been established.

<sup>2</sup> For example, one may insert abstraction-violating get/putfield instructions in the subject-observer code at http://comcom.csail.mit.edu/e4pubs/{#}observer. This tool implements an advanced variant of invariance regulation using ghost instructions, semantic collaboration [59], for Eiffel. Fields are not private, and the methodology does not prevent representation exposure between such closely coupled classes.

/∗ SubjectObserver.h ∗/ **typedef struct** subject ∗Subject; **typedef struct** observer ∗Observer;

/∗ Subject.h∗/ **#include** "SubjectObserver.h" Subject newSubject (**void**); **void** registr (Subject s, Observer o); **void** update (Subject s, **int** n); **int** get (Subject s); **int** freeSubject(Subject s); Observer detachfirst(Subject s);

/∗Observer.h ∗/ **#include** "SubjectObserver.h" Observer newObs (Subject s); **void** notify (Observer o); **int** val (Observer o); **void** freeObserver (Observer o); /∗ Subject-rep.h ∗/ **#include** "SubjectObserver.h" **typedef struct** node ∗Node; **struct** node { Observer obs; **struct** node ∗ next; }; **struct** subject { Node obs; **unsigned** value; }; /∗Observer-rep.h∗/ **#include** "SubjectObserver.h" **struct** observer { Subject sub; **int** cache; };

**Fig. 6.** Subject/Observer: header files. The left column shows the public APIs; Subject rep.h and Observer rep.h are private to their respective module implementations.


**Fig. 7.** Excerpts from Subject.c and Observer.c for the callback update → notify → get.

validity of module invariants, and maintain representational hiding by being just as abstract as a module's main predicate.

Concretely, our approach employs semantic subjects that are comprised of a list of observer references and a (current) value, while observers are represented as a subject pointer and the cache:

```
Definition SubjRep:= (list val) ∗ Z. Definition ObsRep := val ∗ Z.
```
Next, our APDs complement the predicates relevant for API calls by external clients, Srep and Orep, by (residual) predicates for calling the Subject functions registr, update, and get, and the Observer functions notify and val; we also introduce a predicate for the postcondition of get, GetPost:

**Record** SubjectAPD := {

Srep, RegPre, UpdPre, GetPre, GetPost: SubjRep → val → mpred;

SubjRegister: ∀ S s, Srep S s RegPre S s;

SubjUpdate: ∀ S s, Srep S s UpdPre S s;

SubjGetPrePost: ∀ S s, Srep S s GetPre S s ∗ (GetPost S s -∗ Srep S s);

GetPre-ptrnull: ∀ S s, GetPre S s !!(is-pointer-or-null s) }

**Record** ObserverAPD := { Orep, NtfPre, ValPre: ObsRep → val → mpred; ObsNtfy: ∀ O o, Orep O o NtfPre O o; ObsVal: ∀ O o, Orep O o ValPre O o; NtfPre-isptr: ∀ O o, NtfyPre O o !!(isptr o) }

Entailment axioms such as SubjUpdate permit external clients to invoke callback functions directly but may be omitted for functions that should only be invoked via callbacks. The residual predicates sanction indirect invocations via callbacks without revealing the satisfaction status of module-internal invariants.

Axiom SubjGetPrePost splits Srep into a token that can (only) be used to invoke get, plus a token for reestablishing Srep from GetPost. The latter is a separating implication −∗ rather than an entailment: it represents the requirement that an observer yields back control to its subject after completing a callback to get – the subject had retained part of its state prior to invoking notify.

To enforce these behaviors, we employ the specifications in Figures 8 and 9; again, the ASIs are parametric in all APDs mentioned, notwithstanding the mutual dependence of the modules. Using axiom SubjGetPrePost, one may show


**Fig. 8.** ASI of Subject (excerpt), parametric in a SubjectAPD (SP), an ObserverAPD (OP), and a MemoryAPD (M).

that the specifications for get and notify are in subsumption relationship with large-footprint counterparts that permit invocations by external clients:

$$\{\mathsf{Srep}\_{\mathsf{SP}} \, S \, s\} \sim \{p. \, !! (p = sn \, d \, S) \, \& \, \mathsf{Srep}\_{\mathsf{SP}} \, S \, s\},$$

$$\{\mathsf{NftPêep}\_{\mathsf{OP}}(s,c) \, o\*\mathsf{Srepp}\_{\mathsf{SP}}\, S~s\} \sim \{\mathsf{Orepp}\_{\mathsf{OP}}\,(s,nd\,\, S) \, o\*\mathsf{Srepp}\_{\mathsf{SP}}\, S~s\}$$

The specification of update makes reference to an auxiliary Coq function that represents the "big" separating conjunction ∗(v,o)∈combine(vals,l)P (s, v) o,

Observers (P:ObsRep → val → mpred) (s:val) (vals: list Z) (l: list val): mpred.

The substantiation of these interfaces relative to our C implementations defines the main predicates as

**Definition** Srep (l, v) <sup>s</sup> := <sup>∃</sup>o. listrep l o <sup>∗</sup> <sup>s</sup> Ews −−→STP (o, v ∗ Mtok(Ews, STP, s). **Definition** Orep O o := o Ews −−→OTP O ∗ Mtok(Ews, OTP, o).


**Fig. 9.** ASI of Observer, parametric in APDs SP, OP, and M.

Here, listrep is a typical list representation predicate over Node items, modeling the observers associated with a Subject. STP and OTP are shorthands for Clight's representation of the struct definitions Subject and Observer, Ews represents an exclusive writable share in VST, and Mtok(., ., .) is a variant of predicate malloc-token' from Section 4.4.

Some residual predicates are minor variants of Srep and Orep. For example,

**Definition** NtfyPre O o := Mtok(Ews, OTP, o) ∗ ∃v. o Ews −−→OTP (fst O, v).

existentially abstracts over snd O but is otherwise identical to Orep. This makes validating axiom ObsNtfy trivial. As NtfyPre does not depend on a subject's value, no modification of the latter can affect the former's. Other residual predicates – like RegPre – are even definitionally equal to the main predicates, but the APD mechanism ensures that this fact is not exposed to clients.

Our C implementation permits GetPre and GetPost to actually be defined identically (indeed, getters typically don't alter data structures. . . ):

**Definition** GetPrePost (l, v) s := s.value Ews −−→STP v ∗ Mtok(Ews, STP, s).

Here, the p.π sh −→<sup>t</sup> v is a variant of p sh −→<sup>t</sup> v that specifies the content at p.π, where path π is a list of field names and array subscripts. Thus, GetPrePost only specifies the content of s.value; the remaining portion of s is exactly what is retained when SubjGetPrePost splits off GetPre from a Subject. The motivation for this handling is that the invariant of the loop in update (which contains the callback to get via notify) only traverses the node list. Specifically, an invariant involving the full Srep would not ensure that the spine of the list remains unchanged, as the definition of Srep quantifies existentially over the node list. This aspect illustrates the danger of predicates that are too abstract to be useful.

Constructing VSUs for Subject and Observer proceeds straight-forwardly; we exercise VSU's support for shared libraries by first combining surelyMalloc with each of these VSUs separately, before linking the resulting VSUs with each other, with VSUMF, and with a main client as described in Section 4.5.

### **5.1 Specification and proof reuse**

To evaluate specification modularity and proof reuse, we verified several variations of our implementation. First, to evaluate robustness under representational change, we have Subject internally maintain a freelist of Observer nodes:

# **struct** subject { Node fl; Node obs; **unsigned** value; };

The freelist is drawn upon in registr (we only invoke surely-malloc if fl is null) and replenished in detachfirst. Constructor newSubject creates an empty freelist, and freeSubject frees the entire list.

The code modification triggers new Clight ASTs, but the majority of Coq files can then simply be reprocessed: the model-level definitions, APDs, and ASIs of Subject and Observer remain unchanged, and so do the files associated with verifying Observer, linking, and main. The only modifications are in the implementation-dependent validation of Subject, namely in the definitions of the representation predicates and in the VST proofs of the individual functions.

Second, we verified a variant in which notify's invocation of get is replaced by a function pointer. The key code modifications are

/∗Addition in SubjectObserver.h∗/ **typedef int** (∗callback)(Subject s); /∗Modification in Observer.h∗/ **void** notify (Observer o, callback f);

/∗Modification in Observer.c∗/

**void** notify (Observer o, callback f) { o → cache = f(o → sub); **return**; };

The calls to notify in update and registr obtain the additional argument &get, and the specification of get can be removed from the imports of the Observer VSU. The small specification of notify becomes

> notify(o, g) : {NtfPre (s, c) <sup>o</sup> <sup>∗</sup> GetPreSP S s <sup>∗</sup> funcptr <sup>φ</sup>get <sup>g</sup>} ❀ {OrepOP(s, snd S) o ∗ GetPostSP S s}

where funcptr φ g expresses that value g is a pointer to some function satisfying specification φ, and φget is the entry for get from Fig. 8. notify's large specification is adapted similarly. Repairing the proofs incurs changes in < 10 lines of Coq.

A third modification exploits VST's support for impredicative quantification to abstract over GetPreSP and GetPreSP in the definition of φ, such that notify's specification is effectively parametric in suitable GetPre/GetPost pairs. Adapting the verification involves step-indexed aspects of VST and hence requires a little more work; details are included in the Coq development [14].

Finally, we verified a variation in which observers register with two subjects, as an example of a more complex interaction pattern. As this affects model-level functionality, modifications are not confined to module-internal predicate definitions but affect APDs declarations and ASI definitions. However, neither the encapsulation of representation nor the modularity of verification were compromised; supporting more than two subjects per observer would likely be similar.

### **5.2 Pattern-level specification**

An alternative specification of subject-observer was proposed by Parkinson [55], who sidesteps the conflict between callbacks, modularity, and abstraction. Giving up on specifying the two classes independently, this approach defines a single abstract predicate, SubObs, that ties a subject to all its observers and yields aggregate-level function specifications. We can recover such an aggregate interface by proving that the specifications involving SubObs are abstractions (in the sense of . <: .) of the exports of the SubjectObserver VSU, generically in APDs SP and OP. Indeed, Parkinson's formulation amounts to a two-predicate APD:

**Record** AggAPD :=

{ Sub: val → list val → Z → mpred; Obs: val → val → Z → mpred }.

with specifications shown in Figure 10, using the derived notions

**Definition** SubObs sOv := Sub sOv ∗ ∗<sup>o</sup>∈<sup>O</sup>Obs osv. **Definition** Obs o s := ∃v. Obs osv. (∗ Obs is related to Obs as → is to →. ∗) **Definition** SubObs s O := ∃v. SubObs sOv.


**Fig. 10.** Selected aggregate specifications, parametric in an AggAPD A. Except for the occurrence of MemM gv, the specifications coincide with Parkinson [55]'s specifications.

Constructing an AggAPD A from a SP/OP pair is trivial: take Sub to be SrepSP and Obs to be OrepOP; proving the . <: . lemmas is then straight-forward.

SubObs constitutes a pattern invariant, or the pattern's primary predicate, with residuals Sub and Obs. From the aggregate's point of view, update → notify → get is not a callback but an internal nesting of invocations, so the smallfootprint specifications typically don't pose a problem for existing methodologies; client-visible specifications with large footprints can be derived using the frame rule. In this sense, the pattern reestablishes "sequential atomicity" of operations. Exploring whether other design patterns can be similarly derived from the ASIs of their constituent classes is a topic for future research: are typical design patterns the abstraction units at which sequential atomicity is reestablished, callbacks at most occur in valid states, and residual predicates are avoided?

An aggregate specification for the function pointer implementation from Section 5.1 can be obtained using a modified AggAPD, with residual predicates GetPre etc.. But a better option is to remove the pattern-internal functions notify, registr, and perhaps even get from the aggregate ASI. In fact, notify's new signature reveals the use of function pointers, hence even an aggregate-level specification would have to include funcptr φ g terms. Thus, we instead employ the notion from Section 3.2 to lift the VSU for SubjectObserver with function pointers from Section 5.1 to a VSU for the aggregate but narrowed ASI and then reverify main w.r.t the latter.

# **6 Verification of object principles**

This section considers features that – together with state encapsulation and modularity – are cornerstones of object orientation: the ability for (instances of) multiple implementations of an interface to dynamically coexist and interact, dynamic dispatch, subtyping, self, and inheritance. To maintain the dataless discipline, we employ a uniform but simple object encoding that is typical for industrial and open-source C developments: dynamic dispatch is implemented using function pointers that are bundled into separate **struct**s (method tables) that are accessible as the first element of the object representations. Subtyping – providing additional methods – and representation inheritance are modeled by extending these **struct**s, respectively, but are orthogonal to each other, and only the former one is exposed in APIs. In the second half of this section, we hide the dynamic dispatch mechanism behind a wrapper interface. We specify objects by reference to a semantic (Coq-level) object model, thus comprehensively separating object reasoning from C-level reasoning: constructors establish, and methods maintain, abstract object predicates that clients need not (and cannot) unfold.

We again proceed in stages, using the widely used running example of points located on a one-dimensional axis (see e.g. [18]). Figure 11 shows a preliminary API for basic, bumpable, and colored points, organized in a simple subtyping relationship. We provide multiple implementations for each interface (using dif-


**Fig. 11.** PointInterface.h, containing three interfaces for one-dimensional points

ferent data representations), each exposing its set of constructors in a separate header file - Figure 12 shows implementation **I1**. Clients select an implementation during object creation but cannot otherwise distinguish between them: method dispatch selects the appropriate function from the method table, as in

BPoint bp = makeBPoint-I1(4); **int** i = ((BMethods)(bp→mtable))→get((Point)bp)).


**int** get-I1 (Point p) { **return** (((**struct** point-I1 ∗)p)→ value); } **void** set-I1 (Point p, **int** i) { ((**struct** point-I1 ∗)p)→ value = i; **return**; } **void** bump-I1 (BPoint p) { ((**struct** bpoint-I1 ∗)p)→ value++; **return**; } **int** getC-I1 (CPoint p) { **return** (((**struct** cpoint-I1 ∗)p)→ color); } BPoint makeBPoint-I1 (**int** i) { **struct** bpoint-I1 ∗p=(**struct** bpoint-I1 ∗)surely-malloc(**sizeof** ∗p); BMethods m = (BMethods)surely-malloc(**sizeof** ∗m); m → get = &get-I1; m → set = &set-I1; m → bump = &bump-I1; p → value = i; p → mtable = m; **return** ((BPoint)p); }

**Fig. 12.** Implementation **I1**. Constructors makePoint-I1 and makeCPoint-I1 omitted. A second implementation **I2** employs representations point-I2 etc. and exposes constructors makeBPoint-I2 etc.

The basis of object specifications is a general method table predicate:

MTable(T, <sup>k</sup>, names, m, specs, <sup>I</sup>) <sup>=</sup> Mtok(Ews, <sup>k</sup>, m) <sup>∗</sup> <sup>∃</sup> π.!!(readable(π)) && <sup>∗</sup>(μ,φ)∈names×specs <sup>∃</sup> v.funcptr(<sup>φ</sup> <sup>I</sup>, v) <sup>∗</sup> m.μ →<sup>π</sup> <sup>k</sup> v.

It asserts that the **struct** m (of shape k) contains at field names names pointers to functions satisfying specs, where <sup>I</sup> is of Coq-type Pred(T) <sup>=</sup> (T∗val) <sup>→</sup> mpred and specs has type list (Pred(T) → funspec). A generic object layout predicate

$$\mathcal{N}\ T\ tbl\ (\$\sigma\ k:\mathsf{type}\$)\ names{names}\ \$species\ \$pres\ \$(x:T\*val):\mathsf{mpred}=\exists\delta\ \$\mathcal{T}.\ \$Tx\*\mathsf{Mtok}(\mathsf{Ews},\delta,snd.x)\*\*:\\\exists\ m.(snd.x).tbl\ \mapsto\_{\sigma}^{\mathsf{Ews}}m\*\mathsf{M7table}(T,k,name,m,spec,\mathcal{L})$$

then combines a specified method table (located at field tbl) with the requirement that the (memory identified by the) object pointer satisfy I. C types σ and δ represent the object's static and dynamic types. The joint use of I in MTable and N ensures that an object's methods agree with its data component on what representation predicate should be maintained. The existential abstraction over I ensures representation hiding: external clients merely see a invariant of (Coq) type T. Thus, different C implementations of an object interface may employ different representations but still satisfy the same external specification.

Specifically, we introduce Coq-level object interface types in the style of Hofmann and Pierce's object model [30]:

**Record** PointM (X:Type):Type := { get : X → Z; set : X → Z → X; } **Record** BPointM (X:Type):Type :=

{ PointM-**of**-BPointM :> PointM X; bump : X → X; bumpable : X → Prop; }. **Inductive** Color:Type := blue | red | green. **Record** CPointM (X:Type):Type :=

{ BPointM-**of**-CPointM :> BPointM X; getC: X → Color; color-code: Color → Z; }.

The parameters X represent semantic object representations. On the one hand, we may instantiate these and define Coq-level behaviors, like m1, bm1, cm1:

**Record** PointRep := { value : Z }. **Record** CPointRep := { pointRep :> PointRep; color : Color }. **Definition** m1: PointM PointRep := {| get := fun s ⇒ value s; set := fun s i ⇒ {| value := i |} |}. **Definition** bm1: BPointM PointRep := {| PointM-**of**-BPointM := m1; bump := fun s ⇒ {| value := value s + 1 |}; bumpable := fun s ⇒ min-signed ≤ value s < max-signed |}. **Definition** cm1: CPointM CPointRep := {| ... (∗details omitted∗) |}

But the interface types also enable specifications for get(p) and set(p, j): get spec <sup>T</sup> (<sup>P</sup> : PointM <sup>T</sup>) <sup>=</sup> <sup>λ</sup><sup>I</sup> : Pred T. {I(t, p)} ❀ {get T P t. <sup>I</sup>(t, p)} set spec <sup>T</sup> (<sup>P</sup> : PointM <sup>T</sup>) <sup>=</sup> <sup>λ</sup><sup>I</sup> : Pred T.

{min signed ≤ j ≤ max signed & I(t, p)} ❀ {I(set T P t j, p)} Thus, each method has a Coq-level counterpart that is parametric in (semantic) representations and behaviors. To specify the constructors, we first define specializations of N for the three interfaces by instantiating with the appropriate method specifications and syntactic elements:

<sup>P</sup> <sup>T</sup> (<sup>P</sup> : PointM <sup>T</sup>) : Pred <sup>T</sup> <sup>=</sup> N T mtable point methods [get; set] [get spec T P; set spec T P] <sup>B</sup> <sup>T</sup> (<sup>B</sup> : BPointM <sup>T</sup>) : Pred <sup>T</sup> <sup>=</sup> N T mtable bpoint bmethods [get; set; bump] [get spec T B; set spec T B; bump spec T B] <sup>C</sup> <sup>T</sup> (<sup>C</sup> : CPointM <sup>T</sup>) : Pred <sup>T</sup> <sup>=</sup> N T mtable cpoint cmethods [get; set; bump; getC] [get spec T C ; set spec T C ; bump spec T C ; getC spec T C ] .

Here, point, bpoint, cpoint and methods, bmethods, cmethods are the **struct**s defined in the header file (Figure 11) and mtable, get,..., getC are the field names in these **struct**s. The exemplary spec for base point constructors is then

makePoint(i; gv) : {min signed ≤ i ≤ max signed & MemM gv} ❀ {p. MemM gv ∗ P T P (Init Point(i), p)}.

Verifying **I1** and **I2** then yields VSUs whose export interfaces tie makePoint-I1 makePoint-I2 to the specialization of this constructor to P := m1, and similarly for the other constructors. The resulting objects behave indistinguishably; the existential quantification over I in the definition of N carries over to P, B, and C, ensuring that the representational differences between **I1** and **I2** are hidden from clients: when verifying a method call, clients unroll P etc., but each time receive a "fresh" symbolic representation predicate I.

Wrapper-based verification The unrolling of object predicates corresponds to the exposure of the method table in our API. Programmatically, better encapsulation is provided by wrappers that hide the function pointer mechanism, like

**int** GET (Point p) { Methods m = p→ mtable; **return** (m→ get(p)); }

The header file for these wrappers resembles the API of an ADT, but merely disguises object-orientation: we still support multiple implementations (using the same constructors as above), and operations are still invoked using dynamic dispatch. On the specification side, wrappers can be modeled as an APD

**Record** WrapperAPD := { Wr-Pt: ∀ T, PointM T → Pred T; Wr-BPt: ∀ T, BPointM T → Pred T; Wr-CPt: ∀ T, CPointM T → Pred T }.

with one constructor per interface, in resemblance to the use of class names to index predicate families [56]. The VSU for the wrapper then encapsulates the object predicates P etc., exporting an ASI with specifications such as

GET(p) : {Wr Pt WTP (t, p)} ❀ {get T P t. Wr Pt WTP (t, p)} ∧ {Wr BPt WTP (t, p)} ❀ {get T P t. Wr BPt WTP (t, p)}∧ {Wr CPt WTP (t, p)} ❀ {get T P t. Wr CPt WTP (t, p)}

We can further improve client-side usability by replacing these intersection specifications by a deep embedding of the three interface alternatives; this eliminates a corresponding case distinction in client-side proofs, when symbolic execution reaches the invocation of a wrapper function. As an example, we verified a linked list module that permits insertion of basic, bumpable, or colored points and provides map operations that apply SET, BUMP, . . . to all elements. Each element may internally employ **I1** or **I2**. Of course, the precondition of mapping BUMP requires all elements to be of dynamic type (at least) BPoint and have a bumpable coordinate; however, this condition emerges as a constraint on semantic objects and can be discharged without unfolding object representation predicates.

Self and late binding Verification using the above constructions fails for methods whose body contains virtual calls on self : the definition of N effectively separates the object's data region from the method table upon method entry, making only the former accessible inside the body. To overcome this limitation, we define a variant of N using the higher-order recursive functor

$$\begin{array}{c} \mathcal{F} \left( \mathcal{I} \begin{array}{c} X : \mathsf{Pred} \, T \right) : \mathsf{Pred} \, T \stackrel{\circ}{=} \\ \lambda (x : T \* val). \, \exists \, \delta \, m. \, \mathcal{I} \, x \* \mathsf{Mtok}(\mathsf{Eval}(\mathsf{Sws}, \delta, snd \, x) \* (sn \, x), tbl \mapsto\_{\sigma}^{\mathsf{was}} m) \\ \ast \triangleright \mathsf{MTable}(T, k, names, m, spces, X) \end{array} \right) $$

in which I is now a parameter (we eschew the parameters T,..., specs for readability) and X plays the role of N . Recursion via X is protected by VST's [4] modality ✄; indeed, any access to a method table inside a method happens at least one step later than the method's own invocation. Contractiveness of F (proven in VST) ensures the existence of a fixed point <sup>F</sup>(I) := HORec(F(I)). Recovering the quantification over <sup>I</sup>, we then replace <sup>N</sup> with <sup>N</sup> <sup>∗</sup> := ∃ I. <sup>F</sup>(I). With this modification in place, one may verify virtual calls on self, like a variant of **I1** that implements bump using get and set (still w.r.t. m1, bm1, and cm1).

An important application of self is (observably behavior-altering) method overriding. At the semantic level, Hofmann and Pierce explicate how positive

subtyping supports both early and late binding variants of overriding; these differ in whether the observable behavior of bump (when implemented in terms of get and set) is affected when a subclass subsequently overrides set to, say, reset the coordinate to 0. Furthermore, method overriding may affect how functions defined in a superclass act on subclass-introduced state components. For example, one may impose that updating the coordinate turns a point's color blue. Semantically, all these variations yield novel behaviors m2, bm2, and cm2, etc. that can be compared to the earlier behaviors using Hofmann and Pierce's theory. As a consequence of our two-level reasoning, and the choice to parameterise constructor/method specifications by behaviors, we can leverage their techniques: implementations **I3**, **I4**... that realize the overriding variants can be verified as further VSUs for our earlier export interface, by (now) specializing the constructor specifications to m2, etc.. Afterwards, the modified behaviors propagate through dynamic dispatch and wrappers as expected, permitting clients of e.g. the list module to map bump over elements with different behavior. Side conditions during symbolic method calls refer exclusively to semantic objects and behaviors, do not necessitate the unrolling of representation predicates, and can often just be discharged using simplification.

# **7 Discussion**

Related and future work Certified Abstraction Layers (CAL, [24,26]) are used in the CertiKOS project [25] to verify feature-rich operating system kernels and hypervisors in Coq. CAL permits horizontal and vertical composition of components, and establishes full abstraction between the imports and exports. CAL's methodology was recently rephrased as a synthesis from a systems-oriented DSL, DeepSEA, to C, with a CompCert backend [64]. However, "(T)here is no use of C pointers and no built-in support of dynamic memory allocation (every DeepSEA object is realized as a set of static variables), so programs that need dynamic allocation will have to implement it themselves" ([64], page 10). While this fragment remarkably suffices for the intended application area, it is unlikely to satisfy general-purpose programmers or compiler writers for other systems languages.

Ironclad Apps and Ironfleet [29,28] are systems based on Dafny and TLA+ for verifying safety and liveness of distributed systems, and app security. By connecting model-level, concurrency-aware reasoning, state-machine refinement, and Floyd-Hoare verification, their approach provides abstraction-bridging functionality similar to that of proof-assistant-based reasoning, trading off TCB size and foundational integration in an logical framework against automation and developer productivity. Ironclad Apps compile to verified assembly; Ironfleet employs a formally unverified route via Dafny and the .NET compiler for C#.

Uberspark [ ¨ 67] is a system based on Frama-C and SMT for compositionally verifying commodity system software written in C and assembly. Uberspark's pri- ¨ mary applications are hypervisor components and OS kernels, but it currently addresses only safety and security properties (memory separation, control-flow integrity, information flow) rather than functional correctness. The same limitation applies to proof-carrying code systems [49,3,6,13,27], at (virtual) machine or assembly level. Several PCC systems proposed hierarchies of formalisms that connect operational semantics, a general-purpose program logic, and tactical checkers or algorithmic inference systems for higher-level type systems, abstract interpretation, or program analyses [16,2,17,1]. VST's tactical automation is optimized for symbolic execution and functional correctness, but the underlying proof rules could equally well be used to prove soundness of static analyses or code synthesizers; we expect our structuring principles for separate compilation will be just as useful in these scenarios as they are for functional correctness.

McKinna and Burstall [45] pioneered the use of existential abstraction to formally tie programs to their specifications and proofs in a modern proof assistant. VSU realizes aspects of their vision of deliverables for a mainstream language but is at this point not endowed with similarly rigorous categorical underpinnings.

Representation hiding in separation logic can also be obtained using hypothetical frame rules [54,10], but no such rule is provided by VST at present. Pragmatically, the two approaches appear complementary: modules that expose interesting state (e.g. a list ADT, the point objects,...) favor existential abstraction/APDs, as clients can access associated reasoning principles on demand, at specific program points. In contrast, modules like the resource-unaware memory manager might benefit from hypothetical framing: the predicate MemM gv carries no client-relevant information but still needs to be carried around in many function specifications in our treatment.

VST's specification subsumption resembles behavioral subtyping [44,42], a notion commonly used in verification tools for Java-like languages for relating specifications across a class hierarchy. Exploring the relationship between our use of positive subtyping, other notions of subtyping and inheritance, and Liskov's Substitution Principle [43] constitutes future work.

By supporting field update, Hofmann and Pierce's theory addresses shortcomings of purely functional object models, but its support for object aggregates or complex ownership structures appears limited and not much studied. A twolevel encoding could likely also be developed for concurrency-inspired object models [33,32,31], perhaps by adapting the theory of interaction trees [68,39]. However, VST's partial-correctness interpretation of triples limits the end-to-end usefulness of coinductive reasoning. A recent proposal for integrating statically typed Smalltalk-inspired objects into a functional calculus is Wyvern [52].

In the context of SMT-based verification tools, Parkinson and Bierman [57] highlight examples that go beyond behavioral subtyping, and Summers et al. [66] identify a catalog of advanced uses of class invariants. We intend to apply VST to the former soon; a better understanding of the latter could perhaps commence by recasting Drossopoulou et al.'s general framework for object invariants [22] in separation logic. However, some aspects of class/object invariants may not immediately transfer from Java-like to Smalltalk-style object models.

In Java, an object's representation remains constant over its lifetime. By separately quantifying over I, our pre- and postconditions may support dynamic representation change a la Fickle [21] (with suitable updates to the method table), as long as both representations fit into an object's top-level **struct**.

Krishnaswami et al. [41] verify subject-observer and other patterns (iterators, flyweight, factory) by equipping a functional language, Idealized ML, with effectful specifications based on higher-order separation logic. Their verification was partially formalized in a predicative Hoare Type Theory/Ynot and employs abstract module definitions that combine code and specification. Their use of separating implication can likely be transferred to our setting, but their implementation does not separate the functionality of subjects and observers to same extent and thus does not raise the same specification challenges. Considerate reasoning [65], object propositions [51], and multi-object languages such as Rumer [9] are alternatives in the design space spanned by invariant techniques, aliasing/separation and ownership; all validate variants of Composite pattern.

Extrapolating from our exploration of the Composite pattern, it appears feasible to generate VST specifications, loop invariants, and APD declarations from Verifast [34]; synthesizing full proofs will be more challenging.

Object encodings in the Linux kernel, GTK/GObject, or the SQLite database engine deviate from the Smalltalk tradition and expose APIs that are not fully dataless. We suspect these systems also differ from standard language-level object disciplines in their need for deeply layered ownership control or model-level object aggregates. Like Schreiner's encoding [63], these systems thus provide interesting opportunities for future case studies.

Conclusion The ability of type theory to capture modularity and abstraction is well-established. But while, e.g. Mitchell and Plotkin's insight has been highly influential in the world of functional programming, it has not yet made its way into verification tools for mainstream languages. Taking inspiration from their work, we introduced Verified Software Units as a general component calculus for VST, and developed an infrastructure for separating the declarations of abstract predicates from concrete predicate definitions. We showed that residual predicates support callbacks which violate operation atomicity, as is the case in the subject-observer pattern. Finally, we introduced a two-level approach to specifying object principles, yielding a simple logic for Smalltalk-style objects in C. Together, these innovations substantially advance VST's capability to verify modular C developments that employ diverse programming styles.

Acknowledgments: This work was funded by the National Science Foundation under the awards 1005849 (Verified High Performance Data Structure Implementations, Beringer) and 1521602 Expedition in Computing: The Science of Deep Specification, Appel). The author is grateful to the members of both projects for their feedback and greatly appreciates the reviewers' comments and suggestions.

# **References**

1. Ahmed, A., Appel, A.W., Richards, C.D., Swadi, K.N., Tan, G., Wang, D.C.: Semantic foundations for typed assembly languages. ACM Trans. Program. Lang. Syst. **32**(3), 7:1–7:67 (2010), https://doi.org/10.1145/1709093.1709094


Flinn, J., Levy, H. (eds.) 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI '14. pp. 165–181. USENIX Association (2014), https: //www.usenix.org/conference/osdi14/technical-sessions/presentation/hawblitzel


Formal Methods - 19th International Symposium. LNCS, vol. 8442, pp. 514–530. Springer (2014), https://doi.org/10.1007/978-3-319-06410-9 35


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# An Automated Deductive Verification Framework for Circuit-building Quantum Programs

Christophe Chareton1,2, -, Sébastien Bardin2, François Bobot2, Valentin Perrelle2, and Benoît Valiron<sup>1</sup>

<sup>1</sup> LMF, CentraleSupélec, Université Paris-Saclay, Gif-sur-Yvette, France firstname.lastname@lri.fr <sup>2</sup> CEA, LIST, Université Paris-Saclay, Palaiseau, France

firstname.lastname@cea.fr

Abstract. While recent progress in quantum hardware open the door for significant speedup in certain key areas, quantum algorithms are still hard to implement right, and the validation of such quantum programs is a challenge. In this paper we propose Qbricks, a formal verification environment for circuit-building quantum programs, featuring both parametric specifications *and* a high degree of proof automation. We propose a logical framework based on first-order logic, and develop the main tool we rely upon for achieving the automation of proofs of quantum specification: PPS, a parametric extension of the recently developed path sum semantics. To back-up our claims, we implement and verify parametric versions of several famous and non-trivial quantum algorithms, including the quantum parts of *Shor's integer factoring*, quantum phase estimation (QPE) and Grover's search.

Keywords: deductive verification, quantum programming, quantum circuits

# 1 Introduction

1.1 Quantum computing. Quantum programming is seen as a potential revolution for many computing applications: cryptography [61], deep learning [7], optimization [23,22], solving linear systems [33], etc. In all of these domains, current quantum algorithms beat the best known classical algorithms by either quadratic or even exponential factors. In parallel to the rise of quantum algorithms, the design of quantum hardware has moved from lab-benches [14] to programmable, 50-qubits machines designed by industrial actors [4,38] reaching the point where quantum computers beat classical computers for specific tasks [4]. This has stirred a shift from a theoretical standpoint on quantum algorithms

to a more programming-oriented view with the question of their concrete coding and implementation [66,65,55].

In this context, an important problem is the adequacy between the mathematical description of an algorithm and its concrete implementation as a program.

Fig. 1: The hybrid model

<sup>©</sup> The Author(s) 2021 N. Yoshida (Ed.): ESOP 2021, LNCS 12648, pp. 148–177, 2021. https://doi.org/10.1007/978-3-030-72019-3\_6

1.2 The hybrid model. The vast majority of quantum algorithms are described within the quantum co-processor model [42], i.e. a hybrid model where a classical computer controls a quantum co-processor holding a quantum memory (cf. Figure 1). The co-processor is able to apply a fixed set of elementary operations (buffered as quantum circuits) to update and query (measure) the quantum memory. Importantly, while measurement allows one to retrieve classical (probabilistic) information from the quantum memory, it also modifies it (destructive effect). The quantum memory state is represented by a linear combination of possible concrete values, generalizing the classical notion of probabilities to the complex case, and the core of a quantum algorithm consists in successfully setting the memory in a specific quantum state.

Major quantum programming languages such as Quipper [30], Liqui| [67], Q# [64], ProjectQ [63], Silq [8], and the rich ecosystem of existing quantum programming frameworks [55] follow this hybrid model. Such circuit-building quantum languages are the current consensus for high-level executable quantum programming languages.

1.3 The problem with quantum algorithms. Starting from an initial state, a quantum algorithm typically describes a series of high-level operations which, once composed, realize the desired state. Each high-level operation may itself be described in a similar way, until one reaches elementary operations (quantum gates). Describing an algorithm therefore requires both to list these elementary operations, or quantum circuit, and to specify the circuit's behavior.

A major issue is then to verify that the quantum circuit generated by the code written as an implementation of a given algorithm is indeed a run of this algorithm, and that the circuit has indeed the specified characteristics of shape (for instance: a polynomial size).

While testing and debugging are the common verification practice in classical programming, they become extremely complicated in the quantum case: debugging and assertion checking are problematic due to the destructive aspect of quantum measurement, the probabilistic nature of quantum algorithms seriously impedes system-level quantum testing, and classical emulation of quantum algorithms is (strongly believed to be) intractable. On the other hand, nothing prevents a priori the formal verification [15] of quantum programs.

1.4 Goal and challenges. Our goal is to provide an automated formal verification framework for circuit-building quantum programs. Such a framework should satisfy the following principles: (1) Parametricity: it should allow parametric (i.e. scale-invariant) specifications and proofs, so as to enable the generic specification and verification of parametrized implementations. This is crucial as quantum algorithms always describe parametrized families of circuits; (2) Proof automation: it should, as far as possible, provide automatic proof means in order to ease adoption.

Prior works on quantum formal verification do not fully reach these goals together, as they are either not parametric, or not automated. Model-checking methods [27,70] are fully automatic but not parametric – moreover they are highly scale-sensitive. Recently, Amy [1,2] developed a powerful framework for

reasoning over quantum circuits, the path-sums symbolic representation. Thanks to their good compositional properties, reasoning with path-sums is well automated and can scale up to large problem instances (up to 100 qubits). Yet, the method is not parametric and only addresses fixed-size circuits. On the other side of the spectrum, several approaches deal with parametricity but sacrifice automation as they generate proof obligations in higher-order logic, supported with proof assistants such as Coq or Isabelle/HOL. One can cite the approach of Boender et al. [10], Qwire [53,58], SQIR [35,34] or QHL [68,71,46,69,45]. Combined with the use of the standard matrix semantics for quantum circuits that we show in Section 8 cumbersome for automation — only very few realistic quantum programs have been verified in a parametric way [45,35,34].

1.5 Proposal. We propose Qbricks, an automated formal verification framework for circuit-building quantum programs, featuring parametric specification together with a high degree of proof automation.

We bring two key innovations along the road: (Key 1) we propose the new parametrized path-sums (PPS) symbolic representation of families of quantum circuits, extending path-sums [1] to the parametric case while keeping good compositional properties. PPS prove extremely useful both as a specification mechanism and as an automation mechanism; (Key 2) we carefully tune together our programming language (Qbricks-DSL) and specification logic (Qbricks-Spec) so that the corresponding verification problem remains automatable in practice — first-order proof obligations — while the framework is still expressive enough to write, specify and verify realistic quantum programs (Shor order finding — Shor-OF [61], QPE [41,16], Grover [31]).

	- A flexible symbolic representation for reasoning about quantum states, building upon the recent path-sum symbolic representation [1,2]. Our representation, called parametrized path-sums (PPS), retains the compositional and closure properties of regular path-sums while allowing genericity and parametricity of both specifications and proofs. Especially, first-order logic together with PPS provide a unified and powerful way to reason about many essential quantum concepts (Section 5.2) and fit well with the standard way of describing quantum algorithms. We are the first to highlight this connection and make PPS a "first-class" concept, where prior works are limited to standard path sums, or rely on the standard matrix semantics;
	- A programming and verification framework, that is: on one hand, a core domain-specific language (Qbricks-DSL, Section 4) for describing families of quantum circuits, with enough expressive power to describe parametric circuits from non-trivial quantum algorithms; and on the other hand, a first-order logical (domain-specific) specification language (Qbricks-Spec, Section 5), tightly integrated with PPS and Qbricks-DSL to specify properties of parametrized programs representing families of quantum circuits. The careful interplay between these two components yields first-order proof obligations, and thus is a key aspect of proof automation;

Additional technical material can be found in the online extended versione [13]. Implementation and benchmarks are available online[54].

1.7 Discussion. The scope of this paper is limited to proving properties of circuit-building quantum programs. We do not claim to support right now the interactions between classical data and quantum data (referred to as "classical control" in the literature), nor the probabilistic side-effect resulting from the measurement. Still, we are already able to target realistic implementations of famous quantum algorithms, and thanks to equational theories for complex and real number we can automatically reason on the probabilistic outcome of a measurement. Also, we do not claim any novelty in the proofs for Shor-OF, QPE or Grover by themselves, but rather the first highly-automated parametric correctness proofs of the circuits produced by programs implementing them, and the first parametric correctness proofs of an implementation of Shor-OF.

# 2 Background: Quantum Algorithms and Programs

While in classical computing, the state of a bit is either 0 or 1, in quantum computing [50] the state of a quantum bit (or qubit) is described by amplitudes over the two elementary values 0 and 1 (denoted in the Dirac notation with |0 and |1), i.e. linear combinations α0|0 + α1|1 where α<sup>0</sup> and α<sup>1</sup> are any complex values satisfying |α0| <sup>2</sup> <sup>+</sup> <sup>|</sup>α1<sup>|</sup> <sup>2</sup> = 1. In a sense, amplitudes are generalization of probabilities. More generally, the state of a qubit register of n qubits

<sup>3</sup> QPE is a major quantum building block, at the heart of, e.g., HHL [33] logarithmic linear system solving algorithm or quantum simulation [28].

("qubit-vector") is any superposition of the 2<sup>n</sup> elementary bit-vectors ("basis element", where a bit-vector <sup>k</sup> ∈ {0..2<sup>n</sup> <sup>−</sup> <sup>1</sup>} is denoted <sup>|</sup>kn), that is any <sup>|</sup>u<sup>n</sup> <sup>=</sup> <sup>2</sup>n−<sup>1</sup> <sup>k</sup>=0 <sup>α</sup>k|k<sup>n</sup> such that <sup>2</sup>n−<sup>1</sup> <sup>k</sup>=0 |αk| <sup>2</sup> = 1. For example, in the case of two qubits, the basis is |00, |01, |10 and |11 (also abbreviated |02, |12, |2<sup>2</sup> and |32). Such a (quantum state) vector |k<sup>n</sup> is called a ket of length n (and dimension 2<sup>n</sup>).

Technically speaking, we say that the quantum state of a register of n qubits is represented by a normalized vector in a Hilbert space of finite dimension 2<sup>n</sup> (a.k.a. finite-dimensional Hilbert space), whose basis is generated by the Kronecker product (a.k.a. tensor product, denoted ⊗) over the elementary bitvectors. For instance, for n = 2: |0⊗|0, |0⊗|1, |1⊗|0 and |1⊗|1 act as definitions for |00, |01, |10 and |11.

2.1 Quantum data manipulation. The core of a quantum algorithm consists in manipulating a qubit register through two main classes of operations. (1) Quantum gate. Local operation on a fixed number of qubits, whose action consists in the application of a unitary map to the corresponding quantum state vector i.e. a linear and bijective operation preserving norm and orthogonality. The fact that unitary maps are bijective ensures that every unitary gate admits an inverse. Unitary maps over <sup>n</sup> qubits are usually represented as <sup>2</sup><sup>n</sup> <sup>×</sup> <sup>2</sup><sup>n</sup> matrices. (2) Measurement. The retrieval of classical information out of the quantum memory. This operation is probabilistic and modifies the state of a quantum register: measuring the <sup>n</sup>-qubit system 2n−<sup>1</sup> <sup>k</sup>=0 αk|k<sup>n</sup> returns the bitvector k of length n with probability |αk| <sup>2</sup>. Quantum gates might be applied in sequence or in parallel: sequence application corresponds to map composition (or, equivalently, matrix multiplication), while parallel application corresponds to the Kronecker product, or tensor product, of the original maps — or, equivalently, the Kronecker product of their matrix representations.<sup>4</sup>

2.2 Quantum circuits. In a way similar to classical Boolean functions, the application of quantum gates can be written in a diagrammatic notation: quantum circuits. Qubits are represented with horizontal wires and gates with boxes. Circuits are built compositionally, from a given set of atomic gates and by a small set of circuit

Fig. 2: The circuit for QPE

<sup>4</sup> Given two matrices A (with r rows and c columns) and B, their Kronecker product is the matrix A ⊗ B = ⎛ ⎜⎝ a11B ...acB ... ... ... a<sup>r</sup>1B ...arcB ⎞ ⎟⎠. This operation is central in quantum

information representation. It enjoys a number of useful algebraic properties such as associativity, bilinearity or the equality (A⊗B)·(C ⊗D)=(A·C)⊗(B ·D), where · denotes matrix multiplication.

combinators, including: parallel and sequential compositions, circuit inverting, controlling, iteration, ancilla creation, etc. As an example of a quantum circuit, we show in Figure 2 the bird's-eye view of the circuit for QPE, the (quantum) phase estimation algorithm, a standard primitive in many quantum algorithms. QPE is parametrized by n (a number of wires) and U (a unitary oracle) and is built as follows. First, a register of n qubits is initialized in state |0, while another one is initialized in state |vn. Then comes the circuit itself: a structured sequence of quantum gates, using the unary Hadamard gate H, the circuits U2<sup>i</sup> (realizing U to the power 2<sup>i</sup> ) and the reversed Quantum Fourier Transform inverse(QFT (n)) . Sub-circuits <sup>U</sup><sup>2</sup><sup>i</sup> and inverse(QFT (n)) are both defined in a similar way.

Here, one should simply note two things: (1) the circuit is made of parallel compositions of Hadamard gates and of sequential compositions of controlled U<sup>2</sup><sup>i</sup> (the controlled operation is depicted with vertical lines and symbol •); (2) the circuit is parametrized by n and by U. This is very common: in general, a quantum algorithm constructs a circuit whose size and shape depend on the parameters of the problem. It describes a family of quantum circuits.

2.3 Reasoning on circuits and the matrix semantics. Quantum circuits essentially describe unitary operators [50] acting on Hilbert spaces. In finite dimension, unitary matrices faithfully represent unitary operators: it has been the original mathematical formalism for circuits – coined here as the matrix semantics. If this representation is well-adapted for representing simple high-level circuit combinators such as the action of control or inversion, it is not well-suited for specifying the behavior of many complex circuits coming from the literature. Because of this cumbersomeness, textbook descriptions of circuits make use of an informal representation: operators are described by their action on a basis vector (see, for example the description of Shor-OF in [50, p. 232]). This is however understood as a shortcut notation for matrices which remains the main medium for reasoning on circuits. Formal approaches to quantum computation [35,34,53,58,68,71,46,69,45] witness this prevalence of matrices as circuit representation.

2.4 Path-sum representation. Path sums [1,2] are a recent symbolic representation. Its strength is to formalize the notation used in quantum algorithm literature (eg, [50]). A unitary operator U is written as U : |x → PS(x) where x is a bit vector and PS(x) is defined with the syntax of Fig. 3. In the Figure, addition and multiplication over real are denoted rescpectively with + and ·, and x[i] is the i th projection of bit vector x. The term n is an integer index, characterizing the range of the path-sum. Then each term <sup>k</sup> <sup>∈</sup> -<sup>0</sup>, <sup>2</sup><sup>n</sup> in the path-sum is defined through:


This representation is closed under functional composition and Kronecker product. For instance, if U is defined as in Fig. 3 and if V sends y to P S (y) defined

$$\begin{aligned} PS(x) & \coloneqq = \frac{1}{\sqrt{2}^n} \sum\_{k=0}^{2^n - 1} e^{2 \cdot \pi \cdot P\_k(x)} |\phi\_k(x)\rangle \\ P(x) & \coloneqq = \frac{x\_{[j]}}{2^k} \mid P(x) \cdot P(x) \mid P(x) + P(x) \\ |\phi(x)\rangle & \coloneqq = |b\_{[1]}(x)\rangle \otimes \dots \otimes |b\_{[n]}(x)\rangle \\ b\_{[j]}(x) & \coloneqq = x\_{[j']} \mid \neg b\_{[j']}(x) \mid b\_{[j']}(x) \ \wedge b\_{[j']}(x) \mid b\_{[j']}(x) \ \mathbf{X} \mathbf{R} \quad b\_{[j']}(x) \mid \tt \tt \mid \tt \mathbf{f} \end{aligned}$$

Fig. 3: Syntax for regular path-sums [2,1]

$$\text{as } \frac{1}{\sqrt{2}^{n'}} \sum\_{k=0}^{2^{n'}-1} e^{2 \cdot \pi \cdot i \cdot P\_k'(y)} |\phi\_k'(y)\rangle \text{, then } U \otimes V \text{ sends } |x\rangle \otimes |y\rangle \text{ to }$$

$$\frac{1}{\sqrt{2}^{n+n'}} \sum\_{j=0}^{2^{n+n'}-1} e^{2 \cdot \pi \cdot i (P\_{j/2^n}(x) + P\_{j\%2^n}'(y))} |\phi\_{j/2^n}(x)\rangle \otimes |\phi\_{j\%2^n}'(y)\rangle \tag{1}$$

which is in the form shown in Figure 3. The compositionality of this semantics is used by Amy [2] to prove the equivalence of large circuit instances. Nonetheless, its main limitation stands in the fact that path-sums only address fixed-size circuits. Albeit a compositional tool, useful to automate proofs, it cannot be used for proving properties of parametrized circuit-building quantum programs.

This paper proposes an extension of path-sum semantics to address the parametric verification of quantum programs.

# 3 Introducing PPS

In this section, we introduce the main logical apparatus of our framework: parametrized path-sums. We first present a motivating example and then discuss the construction.

3.1 Motivating example. Let us consider the n-indexed family of circuits consisting of n Hadamard gates, in sequence, as shown in Figure 4. Sequencing two Hadamard gates can easily be shown equivalent to the identity operation. In other word, when fed with |0, if n is even the circuit outputs |0. Albeit small, this circuit family together with its simple spe-

Fig. 4: Motivating Example

cification exemplifies the typical framework we aim at in the context of certification of quantum programs.

– The description of the circuit family is parametrized by a classical parameter (here, the non-negative integer n);


The circuit family presented in Figure 4 will be used in the rest of the paper as a running, toy example for Qbricks. In particular, we show in Example 1 how to code it in our framework and how to express the specification in Example 4. Its parametrized implementation in Qbricks is three lines of code long and its specifications takes six lines. It is proved by recurrence over the parameter n, the induction step requiring two calls for lemmas (depending on the evenness of parameter n).

3.2 Parametrizing path-sums. In order to formalize the semantics of the example of Fig. 4, we aim at generalizing path-sums.

Illustration. For a fixed n, the circuit C<sup>n</sup> implements either the identity (when n is even), in which case the path-sum is P SId(x) = <sup>1</sup> <sup>√</sup><sup>2</sup> 0 20−<sup>1</sup> <sup>k</sup>=0 <sup>e</sup><sup>2</sup>iπ·<sup>0</sup>|x or the Hadamard gate (when n is odd), in which case the path sum is P SH(x) = 1 <sup>√</sup><sup>2</sup> 1 21−<sup>1</sup> <sup>k</sup>=0 <sup>e</sup><sup>2</sup>iπ <sup>k</sup>·<sup>x</sup> <sup>2</sup> |k A candidate parametric path-sum for the family of circuits {Cn}<sup>n</sup> from Figure 4 could then be written in factorized form as

$$PS\_n(x) = \frac{1}{\sqrt{2}^{n\%2}} \sum\_{k=0}^{2^{n\%2}-1} e^{2i\pi \frac{(n\%2)\cdot k \cdot x}{2}} |\mathtt{if } \mathtt{even}(n) \cdot \mathtt{then} \, x \, \mathtt{else} \, k\rangle. \tag{2}$$

Generalization. In general, parametrized Path Sums (PPS) are defined over a language of typed terms with possibly free (typed) variables. At the very minimum the language has to be equipped with Boolean values (to handle the values of the ket-vector) and integers (for instance to handle the range).

Given such a language, a PPS is a path-sum where the range, the phase polynomial and the basis-ket can in general be explicit, open terms referring to external parameters. Formally, a pps is defined as a function inputting a set of parameters p and outputting:


Then, the behaviour of a parametrized quantum circuit C(p) is described as the i/o function inputting a basis ket |x(p) of length the width of C(p) and outputting the parametrized sum term:

$$\begin{aligned} \mathtt{pps\\_apply}(h,\overline{p})(|x(\overline{p})\rangle) &= \\ \frac{1}{\sqrt{2}^{\mathtt{r}\{h,\overline{p}\}}} \sum\_{y\in\mathtt{BV\_{r}}(h,\overline{p})} e^{2i\pi \ast \mathtt{pps\\_any\\_la}(h,\overline{p})(x,y)} |\mathtt{pps\\_ket}(h,\overline{p})(x,y)\rangle. \end{aligned}$$

For sake of readability, we often ommit the explicit mention of the parameters. For instance, the PPS P induced by (2) is parametrized by the integer n. It is such that for any int n, pps\_width(P, n)=1 and pps\_range(P, n) = n%2. Furthermore for any bit vectors x, y of lengths 1 and n%2, pps\_ket(P, n)(x, y) is equal to x if n is even and to y otherwise, and pps\_angle(P, n)(x, y) = n%2 · x[0] · y[0]. One then gets expression (2) by applying pps\_apply(P, p).

Hence, the term language needed for describing PPS of otherwise sophisticated families of quantum circuits can afford to be minimal: first-order typed terms equipped with an equational theory are enough. We also find out that first-order, predicate logic is suitable for specifying the properties of quantum programs: there is no need for higher-order logic such as the ones of Coq or Isabelle/HOL. This is the key to automation.

# 4 Qbricks-DSL

Qbricks-DSL is the (domain-specific) language of our framework. It is designed as a first-order, functional language aimed at circuit description. Measurement is out of the scope of the language, and all Qbricks-DSL expressions are terminating. We follow a very simple strategy for circuit building: we use a regular inductive datatype for circuits, where the data constructors are elementary gates, sequential and parallel composition, and ancilla creation. In particular, unlike Quipper [30] or Qwire [53], a quantum circuit is not a function acting on qubits: it is a simple, static object. Nonetheless, as illustrated by our experimentations (Section 8), this does impede neither expressiveness nor parametricity.

Furthermore, even if the language does not feature measurement, it is nonetheless possible to reason on probabilistic outputs of circuits, if we were to measure the result of a circuit. Indeed, this can be expressed in a regular theory of real and complex numbers (See Section 6.5).

4.1 Syntax of Qbricks-DSL. Qbricks-DSL is a small first-order functional, call-by-value language with a special datatype circ as the medium to build and manipulate circuits. The core of Qbricks-DSL can be presented as the simply-typed calculus presented in Figure 6. The basic data constructors for circ are CNOT, SWAP, ID, the Hadamard superposition gate H, phase shift gate Ph(e) and the parametrized rotation Rz(e). The constructors for high-level circuit operations are sequential composition SEQ, parallel composition PAR and ancilla creation/termination ANC (see Figure 5 for details).

Fig. 5: Circuit combinators

Fig. 6: Syntax for Qbricks-DSL

On top of circ, the type system of Qbricks-DSL features the type of integers int (with constructors n, one for each integer n), the type of Booleans bool (with constructors tt and ff) and the type of n-ary products (with constructor e1,...,en). This type system is not meant to be exhaustive and it can be extended with usual constructs such as floats, lists and other user-defined inductive datatypes — its embedding into WhyML makes it easy to use such types. The term constructs are limited to function calls, let-style composition, test with if-then-else and simple iteration: iter fna stands for f(f(··· f(a)···)), with n calls to f. We again stress that this could easily be extended — we just do not need it.

The language is first-order: this is reflected by the types A of expressions. The type of a function is given by the types of its arguments and the type of its output. The type of a function with inputs of types A<sup>i</sup> and output of type B is written A<sup>1</sup> ×···× A<sup>n</sup> → B.

A function f is either a function f<sup>d</sup> defined with a declaration d or a constant function fc. The functions defined by declarations must not be mutually recursive: this small, restricted language only features iteration. Constant functions consist in integer operators (+, ∗, −, etc), Boolean operators (∧, ∨, ¬, →, etc),

$$\begin{array}{cc} \begin{array}{c} \begin{array}{c} \begin{array}{c} \Gamma \vdash f : A\_{1} \times \cdots \times A\_{n} \to B \\ \Gamma \vdash f(e\_{1}, \ldots, e\_{n}) : B \end{array} \\ \end{array} & \begin{array}{c} \Gamma \vdash e\_{i} : A\_{i} \\ \Gamma \vdash e\_{i} : A\_{i} \\ \end{array} \\ \begin{array}{c} \Gamma \vdash e\_{1} : A\_{1} \times \cdots \times A\_{n} \quad \Gamma \mathrel{\rotate{ $\operatorname{\bf{\hbox{$ \hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\hbox{\pi}{\big{\hbox{\hbox{\hbox{\hbox{\pi}{\big{\hbox{\pi}}{\operatorname{\bf{\pi}}}}}}}}}}}}}}}}} \end{\begin}} \end{}} \end{}} \end{}} \end{}} \end{)} \end{array} \end{array} \;\begin{array}{c} \Gamma \vdash e\_{i} : A\_{i} \end{array}} \qquad\Gamma \vdash e\_{i} : A\_{i} \end{array} \qquad\Gamma \vdash e\_{i} : A\_{i} \end{array} } $$

Fig. 7: Typing rules for Qbricks-DSL

comparison operators (<, ≤, ≥, > ,=, = : int×int → bool) and high-level circuit operators: ctl, invert : circ → circ for controlling and inverting circuits, and width, size : circ → int for counting the number of input and output wires, and the number of gates (not counting ID nor SWAP) in the circuit C. See Figure 5 for the intuitive definition of circuit combinators.

The typing rules are the usual ones, summarized for convenience in Table 7.

4.2 Operational semantics. As any other regular functional programming language, Qbricks-DSL is equipped with an operational semantics based on beta-reduction and substitution. We define a notion of value and applicative context as in Fig. 6. We then define a rewriting strategy as the relation defined with C[e] → C[e ] whenever e → e is one of the rule of Table 8. The table is split into the rules for the language constructs and the rules defining the behavior of the constant functions. We only give a subset of the latter rules. For instance, the arithmetic operations are defined in a canonical manner, and the Boolean and comparison operators are defined in a similar manner on values of type int and bool. The rules for the constant functions acting on circuits are also for the most part straightforward: the size of a sequence is the sum of the sizes of the compounds for instance. The rules which we do not provide are the ones for the control operation ctl: the intuition behind their definition can be found in [13]. For the elementary gates, any definition can be used (see e.g. [50]), as long as it can be written with the chosen set of gates. One just has to adjust the lemmas referring to ctl in Qbricks-Spec. Similarly, the inverse of elementary gates are not given: we can choose the usual ones from the literature —and this definition is then parametrized by the choice of gates.

4.3 Properties. The targeted low-level representation for an expression of type circ is a value made of the circuit data constructors presented in Table 6: a value v of type circ is made out of the grammar SEQ(v1, v2) | ANC(v) | PAR(v1, v2) | CNOT | SWAP | ID | H | Ph(n) | Rz(n). Since recursions are reduced to finite iterations, we can derive the following lemma through a simple inductive reasoning:


Table 8: Operational semantics for Qbricks-DSL

Lemma 1 (Safety properties and normalization). Provided that e : A is a closed expression, and provided that all the functions in e recursively admit (external) definitions, then either e is a value or it reduces. If Γ e : A and e → e , then Γ e : A. Finally, the reduction strategy (→) is normalizing: there does not exist an infinite reduction sequence e<sup>1</sup> → e<sup>2</sup> → ... " 

Example 1. The example of Section 3.1 can be written in Qbricks-DSL as

let aux(x) = SEQ(x, H) let main(n) = iter aux n ID

The function aux inputs a circuit and appends a Hadamard gate at the end. The function main then inputs an integer parameter n and iterates the function aux to obtain n Hadamard in sequence. In particular, one can show that for instance

main 4 →<sup>∗</sup> SEQ(SEQ(SEQ(SEQ(ID, H), H), H), H),

that is, a sequence of 4 Hadamard gates.

4.4 Universality and usability of the chosen circuit constructs. A universal (resp. pseudo-universal) set of elementary gates is such that they can be composed thanks to sequence or parallelism so as to perform (resp. approach arbitrarily close) any unitary matrix. In Qbricks-DSL, we chose the small pseudouniversal elementary set {CNOT, SWAP, ID, H} ∪ - <sup>n</sup>∈<sup>N</sup>{Ph(n), <sup>R</sup>z(n), }. Other gates can then be defined as macros on top of them. If one aims at using Qbricks inside a verification compilation tool-chain, these macros can for instance be the gates of the targeted architecture.

4.5 Validity of circuits. A circuit is represented as a rigid rectangular shape with a fixed number of input and output wires. In particular, there is a notion of validity: a circ object only makes sense provided two constraints:


Note that even these syntactic constraints cannot be checked by a simple typing procedure, because of the higher-order reasoning involved here: the constraints must hold for any value of the parameters. All these constraints apply on parametrized circuits. They translate into constraints for the parameters of their related PPS and are expressed in our domain-specific logical specification language, Qbricks-Spec. They are meant to be sent as proof obligations to a proof engine.

Example 2. Note how the circuit generated by main in Example 1 is not necessarily a valid circuit (although in this case it is). This is one of the constraints that can be handled by Qbricks-Spec, as shown in Example 4.

4.6 Denotational semantics. As all expressions in Qbricks-DSL are terminating, one can use regular sets as denotational semantics for the language. In order to be able to handle the definitions coming up in Section 5, we include in the denotation of each type an "error" element ⊥ We therefore define the denotation of basic types as the set of their values: [|bool|] = {tt, ff, ⊥}, [|int|] = <sup>Z</sup> ∪ {⊥} and [|circ|] = {<sup>v</sup> <sup>|</sup> <sup>v</sup> : circ} ∪ {⊥}. Product types are defined as the set-product: [|A<sup>1</sup> ×···× An|] = ([|A1|] ×···× [|An|]) ∪ {⊥} and [||] = {, ⊥}, the singleton set. Finally, functions are defined as set-functions from the input set to the output set. The denotation of the language constructs are the usual one in a semantics based on sets ; for the constant functions, the definitions are the canonical ones: arithmetic operations maps to arithmetic operations for instance. In Qbricks-DSL, everything is well-defined and <sup>⊥</sup> is only attainable from ⊥. For instance, ⊥ + x = ⊥.

Note that in the denotational semantics one can build non-valid circuits. For instance, the circuit SEQ(CNOT, H) is a member of [|circ|]. This is to be expected as we have the following property:

Lemma 2 (Soundness). Provided that e : A, we have [|e|] ∈ [|A|] \ {⊥}. Moreover, provided that e → e then we have [|e|]=[|e |]. " 

It is however possible to formalize the notion of syntactically valid circuits as a subset of [|circ|].

Definition 1. We define the (syntactic) unary relation Vsyntax on [|circ|] as follows: Each one of the gates belongs to Vsyntax; if C<sup>1</sup> and C<sup>2</sup> belongs to Vsyntax then so does PAR(C1, C2); if moreover 2 ≤ [|width|](C1) then ANC(C1) belongs to Vsyntax and if [|width|](C1)=[|width|](C2) then SEQ(C1, C2) belongs to Vsyntax.

# 5 Qbricks-Spec

The language Qbricks-DSL is only aimed at manipulating circuits. The reasoning features of Qbricks —and the PPS introduced in Section 3— are defined in the logic and the specification tools offered within Qbricks-Spec.

5.1 Syntax of Qbricks-Spec. We define Qbricks-Spec as a first-order, predicate logic with the following syntax.

$$\begin{array}{llll}\text{Formula} & \phi, \psi ::= \phi \lor \psi \mid \phi \land \psi \mid \neg \phi \mid \phi \to \psi \mid\\ & R(\hat{e}\_1, \dots, \hat{e}\_n) \mid \hat{e}\_1 = \hat{e}\_2\\ \text{First-order expression} & \hat{e} & \hfil ::= x \mid c(\hat{e}\_1, \dots, \hat{e}\_n) \mid f(\hat{e}\_1, \dots, \hat{e}\_n) \mid f\_\ell(\hat{e}\_1, \dots, \hat{e}\_n) \mid \end{array}$$

The first-order expressions eˆ form a subset of Qbricks-DSL: they are restricted to variables and (formal) function calls to other first-order expressions. Unlike regular, general expressions —meant to be computational vehicles— these firstorder expressions only aim at being reasoned upon. The function names are then expanded with counterpart logical functions f. Among these new functions, we introduce one function iter<sup>f</sup> : int × A → A for each function f : A → A, standing for the equational counterpart of the iteration<sup>5</sup>. The logic functions are defined equationally in the logic: see Section 6.4 for details. The relation R ranges over a list of constant relations over first-order expressions. In Qbricks-Spec, we identify relations and functions of return type bool. A special relation is the equality: we explicitly introduce it in the syntax to emphasize the fact that Qbricks-Spec is meant to deal with equational theories.

The type system of Qbricks-Spec is extended with opaque types, equipped with constant functions and relations to reason upon them. They come with no computational content: the aim is purely to be able to express and prove specification properties of programs. This is why we do not incorporate them in Qbricks-DSL's type system.

The opaque types we consider in Qbricks-Spec are complex, real, pps, ket and bitvector. The operators and relations for these new types are given in Table 9. Note that in the rest of the paper we will omit the cast operations i\_to\_r and r\_to\_c. We will also use a declared exponentiation function [−] [−] overloaded with types complex × int → complex and real × int → real. For any integer n and boolean b, constructor bv\_cst buildsthe bit vector of length n and constant value b. Other functions for types complex, real and bitvector are standard. Types pps and ket are novel and form the main reasoning vehicle in Qbricks-Spec.

5.2 The types pps and ket. In short, the type pps encodes our parametrized path sum (PPS) representation for expressions of type circ in Qbricks-DSL, while ket encodes the notion of ket-vector. As these types are pure reasoning apparatuses, we only need them in Qbricks-Spec and they are defined uniquely through an equational theory.

<sup>5</sup> This is required to stay within the grammar of terms of Qbricks-Spec.


Table 9: Primary operators for Qbricks-Spec

The type pps is equipped with four opaque accessors: pps\_width, pps\_width, pps\_width, pps\_ket and pps\_angle acting on pps from Section 3.2 and with the function circ\_apply. If path-sums compose nicely, a given linear map does not have a unique representative path-sum (partly due to the fact that phase polynomials are equal modulo 2π). To capture this equivalence, we introduce the constant relation pps\_equiv. In order to relate circuits and PPS, we introduce the constant function circ\_to\_pps: it returns one possible PPS that represents the input circuit. The chosen PPS is built in a constructive manner on the structure of the circuit. A useful relation is (−−) relating a circuit and a PPS: it is defined as (ch) pps\_equiv(circ\_to\_pps(c), h). Another useful macro is function circ\_apply : circ × ket → ket, defined as

$$\mathsf{circ}\mathsf{rcr}\mathsf{\_{\mathsf{apply}}}(C,k) \triangleq \mathsf{pps}\mathsf{\_{\mathsf{apply}}}(\mathsf{circ}\mathsf{\_{\mathsf{to}}}\mathsf{\_{\mathsf{pps}}}(C),k)$$

The type ket is equipped with standard operations for manipulating ketvectors (Table 9). bv\_to\_ket turns a bit vector into a basis ket-vector ; ket\_length returns the number of qubits in the ket ; ket\_get returns the amplitude of the corresponding basis ket-vector. The other operations are the usual operations on vectors: addition, subtraction, tensors, scalar multiplication.

5.3 Denotational semantics of the new types. The denotational semantics of real and complex are respectively the sets <sup>R</sup>∪ {⊥} and <sup>C</sup>∪ {⊥}, and the denotation of the operators are the canonical ones. As for Section 4.6, ⊥ maps to ⊥, so for instance ⊥+<sup>r</sup> x = ⊥. The denotation of bitvector is defined as the set of all bit-vectors, together with the "error" element ⊥. The constant functions are mapped to their natural candidate definition, using ⊥ as the default result when they should not be defined. So for instance, [|bv\_cst|](−1, tt) = ⊥. An element of ket is meant to be a ket-vector: we defined [|ket|] as the set of all possible ket-vectors <sup>2</sup><sup>n</sup> <sup>i</sup>=0 <sup>α</sup>n|bnm, for all possible m, n <sup>∈</sup> <sup>N</sup>, <sup>α</sup><sup>n</sup> <sup>∈</sup> <sup>C</sup> and bit-vectors <sup>b</sup><sup>n</sup> of size m, together with the error element ⊥. Finally, pps is defined as the set of formal path-sums, as defined in Section 3.2, together with the error element ⊥. The denotation of the constant functions are defined as discussed in Section 5.2. As an example, [|pps\_range|] returns the range of the corresponding PPS. The map circ\_to\_pps builds a valid PPS out of the input circuit, or ⊥ if the circuit is not valid.

The defined PPS follows the structure of the circuit. For instance, as shown in Eq. (1) on Page 154 the PPS circ\_to\_pps(SEQ(C1, C2)) is the sequential composition of the two PPS circ\_to\_pps(C1) and circ\_to\_pps(C2). This kind of compositionality is what helps with automation.

5.4 Regular sequents in Qbricks-Spec. Formulas in Qbricks-Spec are typed objects —and, as mentioned in Section 5.1 one can identify them with firstorder expressions of type bool. Due to this correspondence, we shall only say that logic judgments in Qbricks-Spec are well-formed judgments of the form Δ φ where the well-formedness means that Δ φ : bool is a valid typing judgment in Qbricks-DSL. That being said, a well-formed judgment <sup>Δ</sup> <sup>φ</sup> is valid whenever it holds in the denotational semantics: for every instantiation σ sending x : A in Δ to [|A|], the denotation [|φ|] <sup>σ</sup> is valid. In particular, the (free) variables of φ can be regarded as universally quantified by the context Δ.

5.5 Parametricity of PPS. A regular path-sum is not parametric: it represents one fixed functional. So why did we chose [|pps|] to be a set of path-sums? Let us consider an example.

Example 3. Consider the motivating example of Section 3.1 and its instantiation in Example 1 on page 159. The function main describes a family of circuits indexed by an integer parameter n. Now, consider the typing judgment

$$h: \mathtt{pps}, n: \mathtt{int} \vdash (\mathtt{main}(n) \rhd h): \mathtt{bool}.$$

It can be regarded as a relation between PPS h and integers n, valid whenever h represents main(n). Technically, this relation is not quite the graph of a function (since several PPS might match the circuit main(n)).

5.6 Standard matrix semantics and correctness of PPS semantics. Similarly to the type pps, Qbricks is endowed with a (logical) type matrix to handle the matrix interpretation of circuits, together with various functions and relations to reason on it. In particular, Qbricks features a function mat\_get : matrix×int×int → complex, formalizing the access to a matrix element, and a function circ\_to\_mat : circ → matrix realizing the matrix corresponding to a circuit. We then formally show, within our framework (proven in Why3), that for any valid circuit C and ket k of length width(C), applying circ\_to\_pps(C) on k is equivalent to multiplying it by circ\_to\_mat(C):

### Theorem 1 (Soundness of PPS wrt matrix semantics).

$$\begin{aligned} C: \texttt{circ}, k: \texttt{ket} &\vdash \texttt{ket\\_Len}(k) = \texttt{width}(C) \land \texttt{valid}(C) \rightarrow\\ \texttt{apply\\_mat}(\texttt{circ\\_to\\_mat}, C, k) &= \texttt{pps\\_app1y}(\texttt{circ\\_to\\_pps}, C, k) \end{aligned}$$

# 6 Reasoning on Quantum Programs

Thanks to the logic presented in Section 5.4, it is possible to write Qbricks-Spec formulas and to express properties of terms of the restricted syntax of Section 5.1. Provided that the regular sequents are simple enough, these can automatically be handled with the use of SMT solvers.

In this section, we define a specific Hoare logic, Hybrid Hoare Logic (HQHL), to express pre- and post-conditions for arbitrary Qbricks-DSL terms. We then discuss the validity of such judgments and explain how to decompose them into elementary, regular sequents (proof obligations). The claim —backed up by our experiments in Section 8— is that the obtained sequents are in practice simple enough to be dealt with automatically.

We do not present all HQHL rules here, but simply aim to give an intuition of how and why one can rely on an automated deductive system to derive Qbricks-Spec judgments. The complete set of HQHL rules is presented in [13].

6.1 HQHL judgments. In order to be able to express program specifications with pre- and post-conditions, we introduce Hybrid Quantum Hoare Logic (HQHL) sequents of the form <sup>Δ</sup> {φ}e{ψ} : <sup>A</sup> (we omit the type <sup>A</sup> when irrelevant or clear). The formula ψ can make use of a reserved free variable result of type A. Such a sequent is then well-formed provided that Δ φ : bool, Δ, result : A ψ : bool and Δ e : A are valid typing judgments. Note how the reserved free variable result is being added to Δ for typing ψ. For convenience, as syntactic sugar we allow indexed variables result<sup>i</sup> to stand for the ith projection of a tuple.

The validity of an HQHL sequent can be defined semantically, similarly to what was done in Section 5.4: <sup>Δ</sup> {φ}e{ψ} : <sup>A</sup> is valid whenever it is both well-formed and when for every instantiation σ sending x : A in Δ to [|A|] and sending result to [|e|], the denotation [|φ → ψ|] <sup>σ</sup> is valid.

In the following sections, we describe the deduction rules that we rely on in Qbricks. They are designed to be used in a bottom-up strategy to break down judgments into pieces reasoning on smaller terms. Along the way, there is the need for introducing invariants and assertions. As usual, some of these assertions can be derived by computing the weakest-preconditions: we do not necessarily have to introduce every single one. When attaining a term of the restricted grammar of Qbricks-Spec that cannot be further decomposed, one can rely on the rule

$$\frac{\begin{aligned} \varGamma \vdash \phi &\rightarrow \psi[\mathsf{resu1t} := \hat{e}] \\ \hline \varGamma \Vdash \{\phi\} \; \hat{e} \; \{\psi\} : A \end{aligned} (\mathsf{f}\text{-}\mathsf{o})\text{)}$$

to generate a proof obligation as a regular sequent in Qbricks-Spec.

6.2 Deduction rules for term constructs. Figure 10 presents the deduction rules for the term constructs of Qbricks-DSL carrying a computational content: iteration, tests, function evaluation, etc. We also present a standard weakening rule (weaken) and an example of rule for rewriting: The deduction rule (eq) states that whenever two expressions are equal one can substitute one

Γ, x {<sup>φ</sup> <sup>∧</sup> <sup>x</sup> <sup>≤</sup> <sup>0</sup>} <sup>e</sup><sup>2</sup> {P[x, result]} Γ, x,y {φ∧P[x, y]} <sup>f</sup>(y) {P[x+1, result]} <sup>Γ</sup> {φ} iter <sup>f</sup> <sup>e</sup>ˆ<sup>1</sup> <sup>e</sup><sup>2</sup> {P[ˆe1, result]} (iter) <sup>Γ</sup> {P}e1{Q[x<sup>i</sup> := resulti} Γ, x1,...,x<sup>n</sup> {Q}e2{R} <sup>Γ</sup> {P}let <sup>x</sup>1,...,x<sup>n</sup> <sup>=</sup> <sup>e</sup><sup>1</sup> in <sup>e</sup>2{R} (let) <sup>Γ</sup> {P}e1{Q[<sup>x</sup> := result} Γ, x {<sup>Q</sup> <sup>∧</sup> <sup>x</sup>}e2{R} Γ, x {<sup>Q</sup> ∧ ¬x}e3{R} <sup>Γ</sup> {P}if <sup>e</sup><sup>1</sup> then <sup>e</sup>2[<sup>x</sup> := <sup>e</sup>1] else <sup>e</sup>3[<sup>x</sup> := <sup>e</sup>1]{R} (if) <sup>∀</sup>i, Γ {P}ei{Ri[result]} <sup>Γ</sup> {P}e1,...,e2{R1[result1] ∧···∧ <sup>R</sup>n[resultn]} (tuple) <sup>f</sup>(x1,...,xn) e Γ {P}e[x<sup>1</sup> := <sup>e</sup>1,...,x<sup>n</sup> := <sup>e</sup>n]{R} <sup>Γ</sup> {P}f(e1,...,en){R} (decl) Γ P → P- <sup>Γ</sup> {P- } e {Q- } : A Γ, result : A Q- → Q <sup>Γ</sup> {P}e{Q} : <sup>A</sup> (weaken) <sup>Γ</sup> <sup>e</sup><sup>1</sup> <sup>=</sup> <sup>e</sup><sup>2</sup> : A Γ {P[e1]} <sup>e</sup>[e1] {Q[e1]} : <sup>A</sup> <sup>Γ</sup> {P[e2]} <sup>e</sup>[e2] {Q[e2]} : <sup>A</sup> (eq)

Fig. 10: Deduction rules for Qbricks: HQHL rules for term constructs

for the other inside a HQHL judgment. Finally, we can derive from the semantics the usual substitution rules. For instance, provided that Γ, x : A ψ and Γ eˆ : A then Γ ψ[x := ˆe]. Note that in the rules, the first-order expressions of the form eˆ are from the restricted grammar of terms of Qbricks-Spec.

6.3 Deduction rules for pps. The main tools to relate circuits and PPS are the constant function circ\_to\_pps, its relational counterpart (− −), and the declared function circ\_apply. They can be specified inductively on the structure of the input circuit. The complete set of rules for circ\_to\_pps and (− −) can be found in [13].

Compositionality of SEQ. For instance, one can derive the deduction rules for circ\_apply applied to SEQ from Table 11. These rules can be used in a bottom-up manner to derive composable, elementary properties of circuits out of sub-circuits. In the table, we abbreviate pps\_acc(circ\_to\_pps(−)) with Cacc, for acc ∈ {width, range, ket,angle} and, given two bit vectors x and y, x · y denotes their concatenation.

Example of deduction rule for HAD. Using the notations from above, we define the following axiom for function circ\_to\_pps applied to the gate HAD:

Prec-SEQ <sup>Γ</sup> {φ}C1{C\_width(result, {p}) = <sup>w</sup>} <sup>Γ</sup> {φ}C1{C\_width(result, {p}) = <sup>w</sup>} SEQ<sup>w</sup> <sup>Γ</sup> {φ}SEQ(C1, C2){C\_width(result, {p}) = <sup>w</sup>} {Prec-SEQ} <sup>Γ</sup> {φ1}C1{C\_range(result, {p}) = <sup>r</sup>1({p})} <sup>Γ</sup> {φ2}C2{C\_range(result, {p}) = <sup>r</sup>2({p})} SEQ<sup>r</sup> <sup>Γ</sup> {φ<sup>1</sup> <sup>∧</sup> <sup>φ</sup>2}SEQ(C1, C2){C\_range(result, {p}) = <sup>r</sup>1({p}) + <sup>r</sup>2({p})} {Prec-SEQ} <sup>Γ</sup> {φ1}C1{C\_angle(result, {p})(x, y1) = <sup>a</sup>1({p}, x, y1)} <sup>Γ</sup> {φ1}C1{C\_ket(result, {p})(x, y1) = <sup>k</sup>1({p}, x, y1)} <sup>Γ</sup> {φ2}C2{C\_angle(result, {p})(k1({p}, x, y1), y2) <sup>=</sup> <sup>a</sup>2({p}, x, y1, y2)} SEQ<sup>a</sup> <sup>Γ</sup> {φ<sup>1</sup> <sup>∧</sup> <sup>φ</sup>2}SEQ(C1, C2){C\_angle(result, {p})(x, y<sup>1</sup> · <sup>y</sup>2) = a1({p}, x, y1) + a2({p}, x, y1, y2)} {Prec-SEQ} <sup>Γ</sup> {φ1}C1{C\_ket(result, {p})(x, y1) = <sup>k</sup>1({p}, x, y1)} <sup>Γ</sup> {φ2}C2{C\_ket(result, {p})(k1({p}, x, y1), y2) <sup>=</sup> <sup>k</sup>2({p}, x, y<sup>1</sup> · <sup>y</sup>2)} SEQ<sup>k</sup> <sup>Γ</sup> {φ<sup>1</sup> <sup>∧</sup> <sup>φ</sup>2}SEQ(C1, C2){C\_ket(result, {p})(x, y<sup>1</sup> · <sup>y</sup>2) = <sup>k</sup>2({p}, x, y<sup>1</sup> · <sup>y</sup>2)

Fig. 11: Deduction rules for circ\_apply on sequence of circuits

$$\left\{ \begin{array}{l} \mathsf{b}\mathsf{r}\_{\mathsf{t}}, y:\mathsf{b}\mathsf{t}\mathsf{t}\mathsf{vec}\mathsf{c}\mathsf{t}\mathsf{t} \\ \mathsf{b}\mathsf{v}\_{\mathsf{t}}\mathsf{l}\mathsf{c}\mathsf{s}\mathsf{t}\mathsf{t}\mathsf{t}\mathsf{t} = 1 \\ \mathsf{b}\mathsf{v}\_{\mathsf{t}}\mathsf{l}\mathsf{c}\mathsf{s}\mathsf{t}\mathsf{t}\mathsf{t}\mathsf{t}\mathsf{t}\mathsf{y} = 1 \end{array} \right\}\_{\mathsf{H}\mathsf{AD}} \left\{ \begin{array}{l} \mathsf{C}\_{\mathsf{u}}\mathsf{width}(\mathsf{r}\mathsf{s}\mathsf{u}\mathsf{t}\mathsf{t}) = 1, \\ \mathsf{C}\_{\mathsf{c}}\mathsf{r}\mathsf{a}\mathsf{s}\mathsf{e}(\mathsf{r}\mathsf{s}\mathsf{u}\mathsf{t}\mathsf{t}, x, y) = x\_{[0]} \* y\_{[0]}, \\ \mathsf{C}\_{\mathsf{s}}\mathsf{k}\mathsf{e}(\mathsf{r}\mathsf{s}\mathsf{u}\mathsf{l}\mathsf{t}, x, y) = y \end{array} \right\}$$

Example 4. Consider the motivating example of Section 3.1 and its instantiation in Example 1. We can now give a specification to the function main, as follows:

$$\begin{array}{c} \{n:\mathsf{int},m:\mathsf{int},x:\mathsf{left}\Vdash\mathsf{int}\,\mathsf{int}\,\mathsf{int}(x)=1\land n=2\ast m\}\} \\ \mathsf{main}(n) \\ \{\mathsf{circ}.\mathsf{app1y}(\mathsf{result},x)=x\} \end{array}$$

The fact that circ\_apply is well-defined implies that C is valid.

6.4 Equational reasoning. The SMT solvers we aim at using to discharge proof obligations require equational theories describing how to reason on the constant functions that were introduced. Some of these equational theories, such as bit-vectors and algebraic fields, are standard and well-known in verification. Together with a few properties on square-root, exponentiation, real and imaginary parts, the latter is all we need for real and complex: in quantum computation, the manipulations of real and complex numbers turn out to be quite limited – we do not need anything related to real or complex analysis.

The main difficulty in the design of Qbricks has been to lay out equational theories and lemmas for circ, pps and ket that can efficiently help in automatically discharging proof obligations. Many of these equations and lemmas are quite straightforward. For instance, we turn the rewriting rules of Table 8 into equations, such as (x, y : circ) width(PAR(x, y)) = width(x) + width(y), or a : A, n : int iter<sup>f</sup> (a, n + 1) = f(iter<sup>f</sup> (a, n)). These equations maps the (syntactic) computational behavior of expressions into the logic.

Other equations express purely semantic properties. For instance,

$$\begin{aligned} \text{iF}, k: \textbf{ket} \vdash \textbf{circ}. \textbf{ap1y}(\textbf{SEQ}(C\_1, C\_2), k) &= \\ \textbf{circ}. \textbf{ap1y}(C\_1, \textbf{circ}. \textbf{ap1y}(C\_2, k)) \quad (3) \end{aligned}$$

(together with a few hypotheses ensuring correct widths) can be derived from Table 11 and is part of the equational theory.

6.5 Additional deductive rules. Qbricks provides additional reasoning rules, that we do not have space enough to detail here. Upon them are:

Circuit complexity. Certifying the complexity of quantum implementations (e.g., polynomial number of gates in the size of the input) is of primary importance as in mid-term, implementations will have to deal with limited hardware capacities, hence the need for tight circuit constructions. We stress that, while raised by several programming [30] or compilation works [48], this aspect of certification is not addressed by existing formal verification approaches [35,45,1].

Probabilities. The probability of obtaining a result by a measurement is correlated with the amplitudes of the corresponding ket-basis vectors in the quantum state of the memory. In Qbricks-Spec we define proba\_partial\_measure : circ×ket×bitvector → real meaning that when the input circuit is applied to the input ket, if we were to measure the result the probability of obtaining the given vector would be the result of the function.

Wire identification. In some situation, to add a gate in a circuit it is easier to give the number (identifier) of the wire on which the gate applies (such as "apply HAD on wire <sup>n</sup>") instead of sequencing the circuit with Id⊗n−<sup>1</sup> <sup>⊗</sup> HAD. This is for instance the design chosen in QASM or SQIR [35].

In Qbricks it is possible to define such a macro with the use of a derived constructor PLACE(C, k, n). For any circuit C and any integers k, n, if 0 ≤ k ≤ n−width(C), PLACE(C, k, n) applies C on wires k to k+width(C)−1. It is defined as ID⊗<sup>k</sup> <sup>⊗</sup> <sup>C</sup> <sup>⊗</sup> ID⊗n−k−C\_width(C) , where for any <sup>0</sup> < i, ID<sup>i</sup> iter par-ID (<sup>i</sup> <sup>−</sup> 1) ID and par-ID(C) PAR(C, ID). Similarly, Qbricks also provides constructor CONT(C, c, k, n) with additional index <sup>c</sup> in -<sup>0</sup>, n and not in k, k <sup>+</sup> width(c)-. Using adequate qubit permutation, through combinations of PLACE and SWAP, it applies PLACE(C, k, n) with control c.

# 7 Implementation

The framework described so far is implemented as a DSL embedded inside the Why3 deductive verification platform [9,25], written in the WhyML programming language. This allows us to benefit from several strengths of Why3, such as efficient code extraction toward Ocaml, generation of proof obligations (to implement the HQHL mechanism) and access to several proof means: SMT solvers, interactive proof commands or export to proof assistants (Coq, Isabelle/HOL) —although we do not use this latter option in our case-studies.

The development itself counts 17,000+ lines of code, including 400+ definitions and 1700+ lemmas, all proved within Why3. Most of the development concerns the (verified) mathematical libraries. They cover the mathematical structures at stake in quantum computing (complex numbers, Kronecker product, bit-vectors, etc.), together with a formally verified collection of mathematical results. Only two theorems are assumed (for any real x: if 0 ≤ x ≤ 1 then sin(πx) <sup>≤</sup> πx, and <sup>x</sup> <sup>≤</sup> sin(<sup>π</sup> <sup>x</sup> <sup>2</sup> )). Proving them requires function derivation material, not available in Why3 so far. Hence we chose to assume these standard results.

# 8 Case studies and experimental evaluation

We develop and prove parametric implementations of Grover's search, the Quantum Fourier Transform (QFT), the Quantum Phase Estimation (QPE) and the first ever verified implementation of the quantum part of Shor's algorithm (Shor-OF). We also implemented Deutsch-Jozsa (DJ) for comparison.

8.1 Examples of formal specifications. Let us first introduce some of the formal specifications we proved. The specification for QPE [41,16] is shown in Figure 12(a). The procedure inputs a unitary operator U and an eigenvector <sup>|</sup>v of <sup>U</sup> and finds the ghost ([26]) eigenvalue <sup>e</sup><sup>2</sup>πiΦ<sup>v</sup> associated with <sup>|</sup>v. The specification for Shor-OF [61] is shown in Figure 12(b). We developed a certified concrete implementation following the implementation proposed in [5] —a reference in term of complexity.<sup>6</sup> The specification for Grover [31] is shown in Figure 12(c). Given a predicate with <sup>k</sup> true value in -<sup>0</sup>, <sup>2</sup><sup>n</sup>-, Grover's algorithm outputs one of these true values with good probability.

Each of these specifications makes use of specific functions that we do not have the space to detail here (see [13] for details). We however want to note two things. First, these specifications describe results of measurement (with the dedicated functions proba\_partial\_measure\_x). As discussed in Section 6.5, if Qbricks-DSL is not able to handle measurement we are still able with Qbricks-Spec to reason on the result of a measurement, as this is a simple function over complex amplitudes. Another thing to note is that, for Shor-OF and Grover, our specification discuss the polynomial size of the produced circuit.

<sup>6</sup> A further refinement is possible [5], using a hybrid version of the Quantum Fourier Transform, but it would require adding effective measure operation and classical control to Qbricks.

$$\begin{array}{c} \left( \begin{array}{l} \left( \begin{array}{l} \left( \begin{array}{c} \mathsf{pras} \right), \left( \left( \begin{array}{c} \left( \left| \left. \mathsf{int} \right), \left( \left. \mathsf{int} \right), \left( \left. \mathsf{right} \right. \\ \left( \left. \mathsf{right} \right. \right. \right) \left( \left. \mathsf{right} \right. \right. \right) \left( \left. \mathsf{right} \right. \right. \end{array} \right) \right) \right) \right) \\\left( \left( \left( \left. \mathsf{C} \rhd f \right) \wedge \left( \mathsf{vid} \texttt{td} \left( \left. C \right) \right) = n \wedge 0 < k \wedge \mathsf{Eigen} (f, \left| v \right|, e^{2 \pi i \ast \theta}) \right) \right) \\\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad$$

(a) Specification for our implementation of Quantum Phase estimation

$$\begin{pmatrix} \begin{array}{l} \Gamma(a,b,n:\mathsf{int}), (j:\mathsf{glob}\mathsf{int}) \vdash \\ (\mathsf{co\\_prime}(a,b) \land 1 \leq b < 2^n \land 1 \leq j < b \land a^j \% b = 1) \end{array} \end{pmatrix}$$

$$\begin{array}{l} \mathsf{Shor-circ}(a,r,n) \\ \mathsf{proba\\_partially\\_measured\\_p} \left( |1\rangle\_n, \mathsf{error}\_1 \leq \frac{1}{2^{\star 2^{\mathsf{n} \mathsf{r} \mathsf{2}}}} \right) \geq \frac{4}{\pi^2} \land \\ \begin{array}{l} \mathsf{proba\\_partially\\_measured\\_p} \left( |1\rangle\_n, \mathsf{error}\_2 \leq \frac{1}{\frac{1}{2 \star 2^{\mathsf{n} \mathsf{r} \mathsf{2}}}} \right) \geq \frac{\phi(r)}{r} \times \frac{4}{\pi^2} \land \\ \mathsf{sides}(\mathsf{result}) = \mathsf{Shor-poly}(n) \land \\ \mathsf{ancill1as}(\mathsf{result}) = n + 2 \land \ \mathsf{vidt}(\mathsf{result}) = 3 \ast n \end{array} \end{array}$$

(b) Specification for our implementation of Shor-OF algorithm

$$\begin{pmatrix} \Gamma, (C: \mathtt{c} \mathtt{r} \mathtt{c}), (f: \mathtt{int} \to \mathtt{bool}), (n, i, k: \mathtt{int}) \Vdash \\ (\mathtt{im} \mathtt{p} \mathtt{m} \mathtt{st} \mathtt{e} (C, f) \land 1 < n \land 1 \le k < 2^n - 1 \land 1 \le i \\ \land \mathtt{Grad} (\{j \mid 0 \le j < 2^n \land f(j) = \mathtt{true} \}) = k \\ \mathtt{Grower} (C, k, n) \\ \begin{pmatrix} \mathtt{proba\\_partial\mathtt{meas} } \mathtt{resur}\_f \left( \mathtt{resul}, \mathtt{bv\\_cat} \ (n, 0), f = \mathtt{in}^2 \left( \mathtt{arcsin} \left( \sqrt{\frac{k}{2^n}} \right) \mathtt{(1 + 2i)} \right) \land \right) \\ \mathtt{while(result)} = i \ast (\mathtt{size} \mathtt{e} (C) \ast \mathcal{O}(n)) \land \\ \mathtt{with(result)} = n \land \mathtt{accillas} (\mathtt{result}) = 1 \end{pmatrix} \end{pmatrix}$$

(c) Specification for our implementation of Grover's algorithm

Fig. 12: Specifications of the main implementations

8.2 Experimental evaluation. Different metrics about our formal developments are reported in Table 13<sup>7</sup>: lines of decorated code, number of lemmas, proof obligations (PO), automatically proven PO (within time limit 5 seconds) and their percentage among POs, interactive commands we entered to discharge them and time required for the automatic verification of these proofs.

Note that metrics for each implementation strictly concern the code that is proper to it (eg., QPE contains calls to QFT but QPE line in Table 13 does not include the QFT implementation. The whole Shor-OF development is reported in the "Shor-OF full".

Result. Qbricks did allow us to implement and verify in a parametric manner the Shor-OF, QPE and Grover algorithms, at a rather smooth cost and with high proof automation (95% on average, 95% for full Shor-OF).

8.3 Prior verification efforts. Before comparing our approach to prior attempts (Table 14), we first introduce these cases.

<sup>7</sup> Experiments were run on Linux, on a PC equipped with an Intel(R) Core(TM) i7-7820HQ 2.90GHz and 15 GB RAM. We used Why3 version 1.2.0 with solvers Alt-Ergo-2.2.0, CVC 3-2.4.1, CVC4-1.0, Z3-4.4.1.


#LoC + Spec.: lines of decorated code — # Extr.: lines of extracted code (OCaml) #Aut.: automatically proven POs — #Cmd: interactive commands

#Verif. time: automated proof verification time

Table 13: Implementation & verification for case studies with Qbricks

Regular path-sums. [2,1] uses path sums for the verification of several circuits of complexity similar to that of QFT (QFT, Hidden shift, generalized Toffoli, etc). Yet, these experiments consider fixed circuits (up to 100 qubits) and the technique cannot be applied to parametric families of circuits or circuit-building languages.

QHL. Liu et al. [45] report about the parametric verification of Grover search algorithm, on a restricted case <sup>8</sup> and in the high-level algorithm description formalism of QHL – especially QHL has no notion of circuit. So for instance one cannot reason upon the size of a circuit within QHL.

SQIR. Finally, Hietala et al. [35] have presented a parametric (circuit-building) implementation of the Deutsch-Jozsa algorithm in Coq, with two independent full correctness proofs. Recently (Oct. 2020), the authors also presented parametrized versions of QFT, QPE and Grover algorithms [34].

8.4 Evaluation: benefits of PPS and Qbricks. So as to evaluate the proof effort gain of using pps instead of matrices, Table 14 shows some comparison between our case studies implementations and equivalent proved implementations from the literature: the Grover algorithm implementation from [45] in Isabelle/HOL and the implementations [35,34] using SQIR and Coq. As supplementary comparison terms, we implemented Qbricks versions of both QFT and Deutsch-Jozsa using exclusively matrices.

For example the Qbricks implementation of QFT with pps is 18 lines long, with 47 lines of specifications and intermediary lemmas, and its proof required 37 additional interactive commands, hence Spec + Cmd = 84. In comparison, the corresponding SQIR development uses 287 interactive commands (7.7x more).

Conclusion. Relying on PPS semantics and first-order logic instead of matrices and higher-order logics strongly eases the proof effort. In term of command

<sup>8</sup> The case in [45, p. 232] concerns cases where the number k of seeked values is equal to 2<sup>j</sup> for a given integer j.


#LoC.: lines of code – # Spec.: lines of spec. and lemmas – #Cmd: proof commands

Table 14: Compared implementations of case studies, using matrices and pps

lines, proofs are consistently at least 5.6x shorter than non Qbricks examples, up to 13.6x for the case of Grover in QHL and 7.7x for QPE and QFT in SQIR.<sup>9</sup>

# 9 Related works

Formal verification of quantum circuits. Prior efforts regarding quantum circuit verification [27,45,70,53,56,1,2,35,34] have been described throughout the paper, especially in Sections 1, 3.1 and 8. Our technique is more automated than those based on interactive proving [35,34,45], borrows and extends the path sum representation [2] to the parametric case, and do consider a circuit-building language rather than a high-level algorithm description language [45].

Quantum Languages and Deductive Verification. Liu et al. [45] introduce Quantum Hoare Logic for high-level description of quantum algorithms. QHL and our own HQHL are different, as the underlying formalisms have different focus. While QHL deals with measurement and classical control, it does not allow reasoning on the structure of the circuit. On the other hand, Qbricks does not handle classical control, but it brings better proof automation and deduction rules for reasoning on circuits. Combining the two approaches is an exciting research direction.

Verified Circuit Optimizations. Formal methods and other program analysis techniques are also used in quantum compilation for verifying circuit optimization techniques [52,6,32,3,62,57,35]. Epecially, the ZX-calculus [17] represents

<sup>9</sup> The difference with SQIR in the column "Spec+Cmd" is less stringent. By the way, it turns out that SQIR syntax for specifications is often more succint, as eg, Qbricks writes each precondition in a separated line, where Coq writes the same as a singleline conjunction.

quantum circuits by diagrams amenable to automatic simplification through dedicated rewriting rules. This framework leads to a graphical proof assistant [40] geared at certifying the semantic equivalence between circuit diagrams, with application to circuit equivalence checking and certified circuit compilation and optimization [21,20,39]. Yet, formal tools based on ZX-calculus are restricted to fixed circuits, and parametrized approaches are so far limited to pen-and-paper proofs [12].

Other quantum applications of formal methods. Huang et al. [36,37] proposes a "runtime-monitoring like" verification method for quantum circuits, with an annotation language restricted to structural properties of interest (e.g., superposition or entanglement). Similarly, [44] describes a projection based assertion language for quantum programs. Verification of these assertions is led by statistical testing instead of formal proofs. The recent Silq language [8] also represents an advance in the way toward automation in quantum programming. It automatizes uncomputation operations, enabling the programmer to abstract from low level implementation details. Also specialized type systems for quantum programming languages, based on linear logic [60,59,43] and dependent types [51,53], have been developed to tackle the non-duplicability of qubits and structural circuit constraints. Finally, formal methods are also at stake for the verification of quantum cryptography protocols [49,29,11,47,19].

# 10 Conclusion

We address the problem of automating correctness proofs of quantum programs. While relying on the general framework of deductive verification, we finely tune our domain-specific circuit-building language Qbricks-DSL together with its new logical specification language Qbricks-Spec in order to keep correctness reasoning over relevant quantum programs within first-order theory. Also, we introduce and intensively build upon parametrized path sums (PPS), a symbolic representation for quantum circuits represented as functions transforming quantum data registers. We develop verified parametric implementations of the Shor-OF algorithm (first verified implementation) and other famous non-trivial quantum algorithms (including QPE and Grover search), showing significant improvement over prior attempts – when available.

Acknowledgments. This work was supported in part by the French National Research Agency (ANR) under the research project SoftQPRO ANR17-CE25- 0009-02, and by the DGE of the French Ministry of Industry under the research project PIA-GDN/QuantEx P163746- 484124.

# References


<sup>10</sup> https://www.ibm.com/blogs/research/2019/10/on-quantum-supremacy/


<sup>11</sup> https://quantumcomputingreport.com/resources/tools/


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

(-) Ankush Das1, Henry DeYoung1, Andreia Mordido2, and Frank Pfenning<sup>1</sup>

<sup>1</sup> Carnegie Mellon University, Pittsburgh, PA, USA {ankushd,hdeyoung,fp}@cs.cmu.edu <sup>2</sup> LASIGE, Faculdade de Ciˆencias, Universidade de Lisboa, Lisbon, Portugal afmordido@ciencias.ulisboa.pt

**Abstract.** Session types statically describe communication protocols between concurrent message-passing processes. Unfortunately, parametric polymorphism even in its restricted prenex form is not fully understood in the context of session types. In this paper, we present the metatheory of session types extended with prenex polymorphism and, as a result, nested recursive datatypes. Remarkably, we prove that type equality is decidable by exhibiting a reduction to trace equivalence of deterministic first-order grammars. Recognizing the high theoretical complexity of the latter, we also propose a novel type equality algorithm and prove its soundness. We observe that the algorithm is surprisingly efficient and, despite its incompleteness, sufficient for all our examples. We have implemented our ideas by extending the Rast programming language with nested session types. We conclude with several examples illustrating the expressivity of our enhanced type system.

# **1 Introduction**

Session types express and enforce interaction protocols in message-passing systems [29,44]. In this work, we focus on binary session types that describe bilateral protocols between two endpoint processes performing dual actions. Binary session types obtained a firm logical foundation since they were shown to be in a Curry-Howard correspondence with linear logic propositions [7,8,47]. This allows us to rely on properties of cut reduction to derive type safety properties such as progress (deadlock freedom) and preservation (session fidelity), which continue to hold even when extended to recursive types and processes [17].

However, the theory of session types is still missing a crucial piece: a general understanding of prenex (or ML-style) parametric polymorphism, encompassing recursively defined types, polymorphic type constructors, and nested types. We abbreviate the sum of these features simply as nested types [3]. Prior work has restricted itself to parametric polymorphism either: in prenex form without nested types [26,45]; with explicit higher-rank quantifiers [6,38] (including bounded ones [24]) but without general recursion; or in specialized form for iteration at the type level [46]. None of these allow a free, nested use of polymorphic type constructors combined with prenex polymorphism.

In this paper, we develop the metatheory of this rich language of nested session types. Nested types are reasonably well understood in the context of functional languages [3,32] and have a number of interesting applications [10,28,37]. One difficult point is the interaction of nested types with polymorphic recursion and type inference [36]. By adopting bidirectional type-checking we avoid this particular set of problems altogether, at the cost of some additional verbosity. However, we have a new problem namely how to handle type equality (≡) given that session type definitions are generally equirecursive and not generative. This means that even before we consider nesting, with the definitions

$$\mathsf{list}[\alpha] = \oplus \{ \mathtt{nil} : \mathbf{1}, \mathtt{cons} : \alpha \otimes \mathsf{list}[\alpha] \} \qquad \mathsf{list}'[\alpha] = \oplus \{ \mathtt{nil} : \mathbf{1}, \mathtt{cons} : \alpha \otimes \mathsf{list}'[\alpha] \}$$

we have list[A] ≡ list [B] and also list[list [A]] ≡ list [list[B]] provided A ≡ B. The reason is that both types specify the same communication behavior—only their name (which is irrelevant) is different. As the second of these equalities shows, deciding the equality of nested occurrences of type constructors is inescapable: allowing type constructors (which are necessary in many practical examples) means we also have to solve type equality for nested types. For example, the types Tree[α] and STree[α][κ] represent binary trees and their faithfully (and efficiently) serialized form respectively.

$$\mathsf{Tree}[\alpha] = \oplus \{ \mathsf{node} : \mathsf{Tree}[\alpha] \otimes \alpha \otimes \mathsf{Tree}[\alpha], \mathsf{leaf} : \mathsf{1} \}$$

$$\mathsf{STree}[\alpha, \kappa] = \oplus \{ \mathsf{nd} : \mathsf{STree}[\alpha, \alpha \otimes \mathsf{STree}[\alpha, \kappa]], \mathsf{lf} : \kappa \}$$

We have that Tree[α] ⊗ κ is isomorphic to STree[α, κ] and that the processes witnessing the isomorphism can be easily implemented (see Section 9).

At the core of type checking lies type equality. We show that we can translate type equality for nested session types to the trace equivalence problem for deterministic first-order grammars, shown to be decidable by Janˇcar, albeit with doubly-exponential complexity [31]. Solomon [42] already proved a related connection between inductive type equality for nested types and language equality for deterministic pushdown automata (DPDA). The difference is that the standard session type equality is defined coinductively, as a bisimulation, rather than via language equivalence [23]. This is because session types capture communication behavior rather than the structure of closed values so a type such as R = ⊕{**a** : R} is not equal to the empty type E = ⊕{}. The reason is that the former type can send infinitely many **a**'s while the latter cannot, and hence their communication behavior is different, implying that the types must be different. Interestingly, if we imagine a lazy functional language such as Haskell with nongenerative recursive types, then R and E would also be different. In fact, nothing in our analysis of equirecursive nested types depends on linearity, just on the coinductive interpretation of types. Our key results, namely decidability of type equality and a practical algorithm for it, apply to lazy functional languages!

The decision procedure for deterministic first-order grammars does not appear to be directly suitable for implementation, in part due to its doublyexponential complexity bound. Instead we develop an algorithm combining loop detection [23] with instantiation [18] and a special treatment of reflexivity. The algorithm is sound, but incomplete, and reports success, a counterexample, or an inconclusive outcome (which counts as failure). In our experience, the algorithm is surprisingly efficient and sufficient for all our examples.

We have implemented nested session types and integrated them with the Rast language that is based on session types [17,18,19]. We have evaluated our prototype on several examples such as the Dyck language [21], an expression server [45] and serializing binary trees, and standard polymorphic data structures such as lists, stacks and queues.

Most closely related to our work is context-free session types (CFSTs) [45]. CFSTs also enhance the expressive power of binary session types by extending types with a notion of sequential composition of types. In connection with CFSTs, we identified a proper fragment of nested session types closed under sequential composition and therefore nested session types are strictly more expressive than CFSTs.

The main technical contributions of our work are:


# **2 Overview of Nested Session Types**

The main motivation for studying nested types is quite practical and generally applicable to programming languages with structural type systems. We start by applying parametric type constructors for a standard polymorphic queue data structure. We also demonstrate how the types can be made more precise using nesting. A natural consequence of having nested types is the ability to capture (communication) patterns characterized by context-free languages. As an illustration, we express the Dyck language of balanced parentheses and show how nested types are connected to DPDAs also.

*Queues* A standard application of parameterized types is the definition of polymorphic data structures such as lists, stacks, or queues. As a simple example, consider the nested type:

$$\mathsf{queue}[\alpha] \triangleq \& \{ \mathsf{ins} : \alpha \multimap \mathsf{queue}[\alpha], \mathsf{del} : \oplus \{ \mathsf{none} : \mathsf{1}, \mathsf{some} : \alpha \otimes \mathsf{queue}[\alpha] \} \}$$

The type queue, parameterized by α, represents a queue with values of type α. A process providing this type offers an external choice () enabling the client to either insert a value of type α in the queue (label **ins**), or to delete a value from the queue (label **del**). After receiving label **ins**, the provider expects to receive a value of type α (the operator) and then proceeds to offer queue[α]. Upon reception of the label **del**, the provider queue is either empty, in which case it sends the label **none** and terminates the session (as prescribed by type **1**), or is non-empty, in which case it sends a value of type α (the ⊗ operator) and recurses with queue[α].

Although parameterized type definitions are sufficient to express the standard interface to polymorphic data structures, we propose nested session types which are considerably more expressive. For instance, we can use type parameters to track the number of elements in the queue in its type!

$$\mathsf{queue}[\alpha, x] \stackrel{\Delta}{=} \& \{ \mathsf{ins} : \alpha \multimap \mathsf{queue}[\alpha, \mathsf{Some}[\alpha, x]], \mathsf{del} : x \}$$

$$\mathsf{Some}[\alpha, x] \triangleq \oplus \{ \mathtt{some} : \alpha \otimes \mathtt{queue}[\alpha, x] \} \qquad \qquad \mathsf{None} \triangleq \oplus \{ \mathtt{none} : \mathtt{1} \}$$

The second type parameter x tracks the number of elements. This parameter can be understood as a symbol stack. On inserting an element, we recurse to queue[α, Some[α, x]] denoting the push of Some symbol on stack x. We initiate the empty queue with the type queue[α, None] where the second parameter denotes an empty symbol stack. Thus, a queue with n elements would have the type queue[α, Some<sup>n</sup>[α, None]]. On receipt of the del label, the type transitions to x which can either be None (if the queue is empty) or Some[α, x] (if the queue is non-empty). In the latter case, the type sends label **some** followed by an element, and transitions to queue[α, x] denoting a pop from the symbol stack. In the former case, the type sends the label none and terminates. Both these behaviors are reflected in the definitions of types Some and None.

*Context-Free Languages* Recursive session types capture the class of regular languages [45]. However, in practice, many useful languages are beyond regular. As an illustration, suppose we would like to express a balanced parentheses language, also known as the Dyck language [21] with the end-marker \$. We use **L** to denote an opening symbol, and **R** to denote a closing symbol (in a sessiontyped mindset, **L** can represent client request and **R** is server response). We need to enforce that each **L** has a corresponding closing **R** and they are properly nested. To express this, we need to track the number of **L**'s in the output with the session type. However, this notion of memory is beyond the expressive power of regular languages, so mere recursive session types will not suffice.

We utilize the expressive power of nested types to express this behavior.

$$T[x] \stackrel{\Delta}{=} \oplus \{ \mathbf{L} : T[T[x]], \mathbf{R} : x \} \qquad D \stackrel{\Delta}{=} \oplus \{ \mathbf{L} : T[D], \\$ : \mathbf{1} \}$$

The nested type T[x] takes x as a type parameter and either outputs **L** and continues with T[T[x]], or outputs **R** and continues with x. The type D either outputs **L** and continues with T[D], or outputs \$ and terminates. The type D expresses a Dyck word with end-marker \$ [34].

The key idea here is that the number of T's in the type of a word tracks the number of unmatched **L**'s in it. Whenever the type T[x] outputs **L**, it recurses with T[T[x]] incrementing the number of T's in the type by 1. Dually, whenever the type outputs **R**, it recurses with x decrementing the number of T's in the type by 1. The type D denotes a balanced word with no unmatched **L**'s. Moreover, since we can only output \$ (or **L**) at the type D and not **R**, we obtain the invariant that any word of type D must be balanced. If we imagine the parameter x as the symbol stack, outputting an **L** pushes T on the stack, while outputting **R** pops T from the stack. The definition of D ensures that once an **L** is outputted, the symbol stack is initialized with T[D] indicating one unmatched **L**.

Nested session types do not restrict communication so that the words represented have to be balanced. To this end, the type D can model the cropped Dyck language, where unbalanced words can be captured.

$$T'[x] \stackrel{\Delta}{=} \oplus \{ \mathbf{L} : T'[T'[x]], \mathbf{R} : x, \ $ : \mathbf{1} \} \qquad \qquad D' \stackrel{\Delta}{=} \oplus \{ \mathbf{L} : T'[D'], \$  : \mathbf{1} \}$$

The only difference between types T[x] and T [x] is that T [x] allows us to terminate at any point using the \$ label which immediately transitions to type **1**. Nested session types can not only capture the class of deterministic context-free languages recognized by DPDAs that accept by empty stack (balanced words), but also the class of deterministic context-free languages recognized by DPDAs that accept by final state (cropped words).

*Multiple Kinds of Parentheses* We can use nested types to express more general words with different kinds of parentheses. Let **L** and **L** denote two kinds of opening symbols, while **R** and **R** denote their corresponding closing symbols respectively. We define the session types

$$\begin{array}{l} S[x] \stackrel{\scriptstyle \Delta}{=} \oplus \{ \mathbf{L} : S[S[x]], \mathbf{L}' : S'[S[x]], \mathbf{R} : x \} \\ S'[x] \stackrel{\scriptstyle \Delta}{=} \oplus \{ \mathbf{L} : S[S'[x]], \mathbf{L}' : S'[S'[x]], \mathbf{R}' : x \} \\ E \stackrel{\scriptstyle \Delta}{=} \oplus \{ \mathbf{L} : S[E], \mathbf{L}' : S'[E], \mathbf{S} : \mathbf{1} \} \end{array}$$

We push symbols S and S to the stack on outputting **L** and **L** respectively. Dually, we pop S and S from the stack on outputting **R** and **R** respectively. Then, the type E defines an empty stack, thereby representing a balanced Dyck word. This technique can be generalized to any number of kinds of brackets.

*Multiple States as Multiple Parameters* Using defined type names with multiple type parameters, we enable types to capture the language of DPDAs with several states. Consider the language <sup>L</sup><sup>3</sup> <sup>=</sup> {**L**<sup>n</sup>**a R**<sup>n</sup>**<sup>a</sup>** <sup>∪</sup> **<sup>L</sup>**<sup>n</sup>**b R**<sup>n</sup>**<sup>b</sup>** <sup>|</sup> n > <sup>0</sup>}, proposed by Korenjak and Hopcroft [34]. A word in this language starts with a sequence of opening symbols **L**, followed by an intermediate symbol, either **a** or **b**. Then, the word contains as many closing symbols **R** as there were **L**s and terminates with the symbol **a** or **b** matching the intermediate symbol.

$$U \stackrel{\Delta}{=} \oplus \{ \mathbf{L} : O[C[A], C[B]] \} \qquad \qquad O[x, y] \stackrel{\Delta}{=} \oplus \{ \mathbf{L} : O[C[x], C[y]], \mathbf{a} : x, \mathbf{b} : y \}$$

$$C[x] \stackrel{\Delta}{=} \oplus \{ \mathbf{R} : x \} \qquad \qquad A \stackrel{\Delta}{=} \oplus \{ \mathbf{a} : \mathbf{1} \} \qquad \qquad B \stackrel{\Delta}{=} \oplus \{ \mathbf{b} : \mathbf{1} \}$$

The L<sup>3</sup> language is characterized by session type U. Since the type U is unaware of which intermediate symbol among **a** or **b** would eventually be chosen, it cleverly maintains two symbol stacks in the two type parameters x and y of O. We initiate type U with outputting **L** and transitioning to O[C[A], C[B]] where the symbol C tracks that we have outputted one **L**. The types A and B represent the intermediate symbols that might be used in the future. The type O[x, y] can either output an **L** and transition to O[C[x], C[y]] pushing the symbol C onto both stacks; or it can output **a** (or **b**) and transition to the first (resp. second) type parameter x (resp. y). Intuitively, the type parameter x would have the form C<sup>n</sup>[A] for n > 0 (resp. y would be C<sup>n</sup>[B]). Then, the type C[x] would output an **R** and pop the symbol C from the stack by transitioning to x. Once all the closing symbols have been outputted (note that you cannot terminate preemptively), we transition to type A or B depending on the intermediate symbol chosen. Type A outputs **a** and terminates, and similarly, type B outputs **b** and terminates. Thus, we simulate the L<sup>3</sup> language (not possible with context-free session types [45]) using two type parameters.

More broadly, nested types can neatly capture complex server-client interactions. For instance, client requests can be captured using labels **L**,**L** while server responses can be captured using labels **R**, **R** expressing multiple kinds of requests. Balanced words will then represent that all requests have been handled. The types can also guarantee that responses do not exceed requests.

# **3 Description of Types**

The underlying base system of session types is derived from a Curry-Howard interpretation [7,8] of intuitionistic linear logic [25]. Below we describe the session types, their operational interpretation and the continuation type.


The basic type operators have the usual interpretation: the internal choice operator ⊕{ : A}∈<sup>L</sup> selects a branch with label k ∈ L with corresponding continuation type <sup>A</sup>k; the external choice operator { : <sup>A</sup>}∈<sup>L</sup> offers a choice with labels ∈ L with corresponding continuation types A; the tensor operator A ⊗ B represents the channel passing type that consists of sending a channel of type A and proceeding with type B; dually, the lolli operator A B consists of receiving a channel of type A and continuing with type B; the terminated session **1** is the operator that closes the session.

We also support type constructors to define new type names. A type name V is defined according to a type definition V [α] = A that is parameterized by a sequence of distinct type variables α that the type A can refer to. We can use type names in a type expression using V [B]. Type expressions can also refer to parameter α available in scope. The free variables in type A refer to the set of type variables that occur freely in A. Types without any free variables are called closed types. We call any type not of the form V [B] to be structural.

All type definitions are stored in a finite global signature Σ defined as

$$\text{Signature } \Sigma ::= \cdot \mid \Sigma, V[\overline{\alpha}] = A \cdot$$

In a valid signature, all definitions V [α] = A are contractive, meaning that A is structural, i.e. not itself a type name. This allows us to take an equirecursive view of type definitions, which means that unfolding a type definition does not require communication. More concretely, the type V [B] is considered equivalent to its unfolding A[B/α]. We can easily adapt our definitions to an isorecursive view [35,20] with explicit unfold messages. All type names V occurring in a valid signature must be defined, and all type variables defined in a valid definition must be distinct. Furthermore, for a valid definition V [α] = A, the free variables occurring in A must be contained in α. This top-level scoping of all type variables is what we call the prenex form of polymorphism.

# **4 Type Equality**

Central to any practical type checking algorithm is type equality. In our system, it is necessary for the rule of identity (forwarding) and process spawn, as well as the channel-passing constructs for types <sup>A</sup>⊗<sup>B</sup> and <sup>A</sup> <sup>B</sup>. However, with nested polymorphic recursion, checking equality becomes challenging. We first develop the underlying theory of equality providing its definition, and then establish its reduction to checking trace equivalence of deterministic first-order grammars.

### **4.1 Type Equality Definition**

Intuitively, two types are equal if they permit exactly the same communication behavior. Formally, type equality is captured using a coinductive definition following seminal work by Gay and Hole [23].

**Definition 1.** We first define unfoldΣ(A) as

$$\frac{V[\overline{\alpha}] = A \in \Sigma}{\mathtt{unfold}\_{\Sigma}(V[\overline{B}]) = A[\overline{B}/\overline{\alpha}]} \text{ } \text{def} \qquad \qquad \frac{A \neq V[\overline{B}]}{\mathtt{unfold}\_{\Sigma}(A) = A} \text{ } \text{str}\_{\Sigma}$$

Unfolding a structural type simply returns A. Since type definitions are contractive [23], the result of unfolding is never a type name application and it always terminates in one step.

**Definition 2.** Let Type be the set of closed type expressions (no free variables). A relation R ⊆ Type × Type is a type bisimulation if (A, B) ∈ R implies:

**–** If unfoldΣ(A) = ⊕{ : A}∈<sup>L</sup>, then unfoldΣ(B) = ⊕{ : B}∈<sup>L</sup> and also (A, B) ∈ R for all ∈ L.


**Definition 3.** Two closed types A and B are equal (A ≡ B) iff there exists a type bisimulation R such that (A, B) ∈ R.

When the signature Σ is not clear from context we add a subscript, A ≡<sup>Σ</sup> B. This definition only applies to types with no free type variables. Since we allow parameters in type definitions, we need to define equality in the presence of free type variables. To this end, we define the notation ∀V. A ≡ B where V is a collection of type variables and A and B are valid types w.r.t. V (i.e., free variables in A and B are contained in V).

**Definition 4.** We define ∀V. A ≡ B iff for all closed type substitutions σ : V, we have A[σ] ≡ B[σ].

### **4.2 Decidability of Type Equality**

Solomon [42] proved that types defined using parametric type definitions with an inductive interpretation can be translated to DPDAs, thus reducing type equality to language equality on DPDAs. However, our type definitions have a coinductive interpretation. As an example, consider the types A = ⊕{**a** : A} and B = ⊕{**b** : B}. With an inductive interpretation, types A and B are empty (because they do not have terminating symbols) and, thus, are equal. However, with a coinductive interpretation, type A will send an infinite number of **a**'s, and B will send an infinite number of **b**'s, and are thus not equal. Our reduction needs to account for this coinductive behavior.

We show that type equality of nested session types is decidable via a reduction to the trace equivalence problem of deterministic first-order grammars [30]. A first-order grammar is a structure (N , A, S) where N is a set of non-terminals, A is a finite set of actions, and S is a finite set of production rules. The arity of non-terminal <sup>X</sup> ∈ N is written as arity(X) <sup>∈</sup> <sup>N</sup>. Production rules rely on a countable set of variables V, and on the set T<sup>N</sup> of regular terms over N ∪V. A term is regular if the set of subterms is finite (see [30]).

Each production rule has the form Xα <sup>a</sup> −→ E where X ∈ N is a non-terminal, a ∈ A is an action, and α ∈ V<sup>∗</sup> are variables that the term E ∈ T<sup>N</sup> can refer to. A grammar is deterministic if for each pair of X ∈ N and a ∈ A, there is at most one rule of the form Xα <sup>a</sup> −→ E in S. The substitution of terms B for variables α in a rule Xα <sup>a</sup> −→ <sup>E</sup>, denoted by XB <sup>a</sup> −→ E[B/α], is the rule (Xα <sup>a</sup> −→ E)[B/α]. Given a set of rules S, the trace of a term T is defined as trace<sup>S</sup> (T) = {<sup>a</sup> ∈ A<sup>∗</sup> <sup>|</sup> (<sup>T</sup> <sup>a</sup> −→ T ) ∈ S, for some T ∈ T<sup>N</sup> }. Two terms are trace equivalent, written as T ∼<sup>S</sup> T , if trace<sup>S</sup> (T) = trace<sup>S</sup> (T ).

The crux of the reduction lies in the observation that session types can be translated to terms and type definitions can be translated to production rules of a first-order grammar. We start the translation of nested session types to grammars by first making an initial pass over the signature and introducing fresh internal names such that the new type definitions alternate between structural (except **1** and α) and non-structural types. These internal names are parameterized over their free type variables, and their definitions are added to the signature. This internal renaming simplifies the next step where we translate this extended signature to grammar production rules.

Example 1. As a running example, consider the queue type from Section 2:

$$Q[\alpha] = \otimes \{ \mathbf{ins} : \alpha \multimap Q[\alpha], \mathbf{del} : \oplus \{ \mathbf{none} : \mathbf{1}, \mathbf{some} : \alpha \otimes Q[\alpha] \} \}$$

After performing internal renaming for this type, we obtain the following signature:

$$\begin{aligned} Q[\alpha] &= \mathbb{A}\{\mathbf{ins} : X\_0[\alpha], \mathbf{del} : X\_1[\alpha] \} & X\_1[\alpha] &= \oplus \{\mathbf{none} : \mathbf{1}, \mathbf{some} : X\_2[\alpha] \} \\ X\_0[\alpha] &= \alpha \multimap Q[\alpha] & X\_2[\alpha] &= \alpha \otimes Q[\alpha] \end{aligned}$$

We introduce the fresh internal names X0, X<sup>1</sup> and X<sup>2</sup> (parameterized with free variable α) to represent the continuation type in each case. Note the alternation between structural and non-structural types (of the form V [B]).

Next, we translate this extended signature to the grammar G = (N , A, S) aimed at reproducing the behavior prescribed by the types as grammar actions.

$$\begin{split} \mathcal{N} &= \{Q, X\_0, X\_1, X\_2, \bot\} \\ \mathcal{A} &= \{\mathsf{kins}, \mathsf{kdel}, -\circ\_1, -\circ\_2 \oplus \mathsf{none}, \oplus \mathsf{some}, \otimes\_1, \otimes\_2, \} \\ \mathcal{S} &= \{Q\alpha \xrightarrow{\underline{\mathsf{kins}}} X\_0\alpha, \ Q\alpha \xrightarrow{\underline{\mathsf{kdel}}} X\_1\alpha, \ X\_0\alpha \xrightarrow{-\circ\_1} \alpha, \ X\_0\alpha \xrightarrow{-\circ\_2} Q\alpha, \\ &X\_1\alpha \xrightarrow{\oplus \mathsf{none}} \bot, \ X\_1\alpha \xrightarrow{\oplus \mathsf{some}} X\_2\alpha, \ X\_2\alpha \xrightarrow{\otimes\_1} \alpha, \ X\_2\alpha \xrightarrow{\otimes\_2} Q\alpha \} \end{split}$$

Essentially, each defined type name is translated to a fresh non-terminal. Each type definition then corresponds a sequence of rules: one for each possible continuation type with the appropriate label that leads to that continuation. For instance, the type Q[α] has two possible continuations: transition to X0[α] with action &**ins** or to X1[α] with action &**del**. The rules for all other type names is analogous. When the continuation is **1**, we transition to the nullary non-terminal ⊥ disabling any further action. When the continuation is α, we transition to α. Since each type name is defined once, the produced grammar is deterministic.

Formally, the translation from an (extended) signature to a grammar is handled by two simultaneous tasks: translating type definitions into production rules (function τ below), and converting type names, variables and the terminated session into grammar terms (function ·). The function · : OType → T<sup>N</sup> from open session types to grammar terms is defined by:

$$\begin{aligned} \{\mathbf{1}\} &= \bot & \text{type 1 translates to } \bot\\ \{\alpha\} &= \alpha & \text{type variables translate to themselves} \\ \{V[B\_1, \ldots, B\_n]\} &= V\{B\_1\} \cdots \{B\_n\} & \text{type names translate to first-order terms} \end{aligned}$$

Due to this mapping, throughout this section we will use type names indistinctly as type names or as non-terminal first-order symbols.

The function τ converts a type definition V [α] = A into a set of production rules and is defined according to the structure of A as follows:

$$\begin{split} \tau(V[\overline{\alpha}] = \oplus \{\ell \colon A\_{\ell}\}\_{\ell \in L}) &= \{\nw{[\overline{\alpha}]} \} \xrightarrow{\ominus \ell} \{A\_{\ell}\} \mid \ell \in L\} \\ \tau(V[\overline{\alpha}] = \aleph\_{\ell}\{\ell \colon A\_{\ell}\}\_{\ell \in L}) &= \{\nV[\overline{\alpha}] \xrightarrow{\ominus \ell} \{A\_{\ell}\} \mid \ell \in L\} \\ \tau(V[\overline{\alpha}] = A\_{1} \otimes A\_{2}) &= \{\nV[\overline{\alpha}] \xrightarrow{\ominus\_{i}} \{A\_{i}\} \mid i = 1,2\} \\ \tau(V[\overline{\alpha}] = A\_{1} \multimap A\_{2}) &= \{\nV[\overline{\alpha}] \xrightarrow{\multimap\_{i}} \{A\_{i}\} \mid i = 1,2\} \end{split}$$

The function τ identifies the actions and continuation types corresponding to A and translates them to grammar rules. Internal and external choices lead to actions <sup>⊕</sup> and , for each <sup>∈</sup> <sup>L</sup>, with <sup>A</sup> as the continuation type. The type A<sup>1</sup> ⊗ A<sup>2</sup> enables two possible actions, ⊗<sup>1</sup> and ⊗2, with continuation A<sup>1</sup> and A<sup>2</sup> respectively. Similarly A<sup>1</sup> A<sup>2</sup> produces the actions <sup>1</sup> and <sup>2</sup> with A<sup>1</sup> and A<sup>2</sup> as respective continuations. Contractiveness ensures that there are no definitions of the form V [α] = V [B]. Our internal renaming ensures that we do not encounter cases of the form V [α] = **1** or V [α] = α because we do not generate internal names for them. For the same reason, the · function is only defined on the complement types **1**, α and V [B].

The τ function is extended to translate a signature by being applied pointwise. Formally, τ (Σ) = - (<sup>V</sup> [α]=A)∈<sup>Σ</sup> <sup>τ</sup> (<sup>V</sup> [α] = <sup>A</sup>). Connecting all pieces, we define the fog function that translates a signature to a grammar as:

$$\begin{aligned} \mathsf{fog}(\Sigma) &= (\mathcal{N}, \mathcal{A}, \mathcal{S}), \text{ where:} & \mathcal{S} = \tau(\Sigma) \\ \mathcal{N} &= \{ X \mid (X\overline{\alpha} \xrightarrow{a} E) \in \tau(\Sigma) \} \end{aligned}$$

$$\mathcal{N} = \{ X \mid (X\overline{\alpha} \xrightarrow{a} E) \in \tau(\Sigma) \} \qquad \mathcal{A} = \{ a \mid (X\overline{\alpha} \xrightarrow{a} E) \in \tau(\Sigma) \}$$

The grammar is constructed by first computing τ (Σ) to obtain all the production rules. N and A are constructed by collecting the set of non-terminals and actions from these rules. The finite representation of session types and uniqueness of definitions ensure that fog(Σ) is a deterministic first-order grammar.

Checking equality of types A and B given signature Σ finally reduces to (i) internal renaming of Σ to produce Σ , and (ii) checking trace-equivalence of terms A and B given grammar fog(Σ ). If A and B are themselves structural, we generate internal names for them also during the internal renaming process. Since we assume an equirecursive and non-generative view of types, it is easy to show that internal renaming does not alter the communication behavior of types and preserves type equality. Formally, A ≡<sup>Σ</sup> B iff A ≡<sup>Σ</sup>-B.

**Theorem 1.** <sup>A</sup> <sup>≡</sup><sup>Σ</sup> <sup>B</sup> if and only if A <sup>∼</sup><sup>S</sup> B, where (<sup>N</sup> , <sup>A</sup>, <sup>S</sup>) = fog(Σ ) and Σ is the extended signature for Σ.

Proof. For the direct implication, assume that A ∼<sup>S</sup> B. Pick a sequence of actions in the difference of the traces and let w<sup>0</sup> be its greatest prefix occurring in both traces. Either w<sup>0</sup> is a maximal trace for one of the terms, or we have A <sup>w</sup><sup>0</sup> −−→ A and B <sup>w</sup><sup>0</sup> −−→ B , with A <sup>a</sup><sup>1</sup> −→ A and B <sup>a</sup><sup>2</sup> −→ B, where a<sup>1</sup> = a2. In both cases, with a simple case analysis on the definition of the translation τ , we conclude that A ≡ B and so A ≡ B. For the reciprocal implication, assume that A <sup>∼</sup><sup>S</sup> B. Consider the relation

$$\mathcal{R} = \{ (A\_0, B\_0) \mid \mathtt{trace}\_{\mathcal{S}}(\|A\_0\|) = \mathtt{trace}\_{\mathcal{S}}(\|B\_0\|) \} \subseteq \mathtt{Type} \times \mathtt{Type}.$$

Obviously, (A, B) ∈ R. To prove that R is a type bisimulation, let (A0, B0) ∈ R and proceed by case analysis on A<sup>0</sup> and B0. For the case unfoldΣ(A0) = ⊕{ : <sup>A</sup>}∈<sup>L</sup>, we have A<sup>0</sup> <sup>⊕</sup> −−→ A. Since, by hypothesis, the traces coincide, trace<sup>S</sup> (A<sup>0</sup>) = trace<sup>S</sup> (B<sup>0</sup>), we have B<sup>0</sup> <sup>⊕</sup> −−→ B and, thus, unfoldΣ(B0) = ⊕{ : <sup>B</sup>}∈<sup>L</sup>. Moreover, Janˇcar [30] proves that trace<sup>S</sup> (A) = trace<sup>S</sup> (B). Hence, (A, B) ∈ R. The other cases and a detailed proof can be found in [14].

However, type equality is not only restricted to closed types (see Definition 4). To decide equality for open types, i.e. ∀V. A ≡ B given signature Σ, we introduce a fresh label <sup>α</sup> and type A<sup>α</sup> for each α ∈ V. We extend the signature with type definitions: Σ<sup>∗</sup> = Σ ∪<sup>α</sup>∈V {A<sup>α</sup> = ⊕{<sup>α</sup> : Aα}}. We then replace all occurrences of α in A and B with A<sup>α</sup> and check their equality with signature Σ∗. We prove that this substitution preserves equality.

**Theorem 2.** ∀V. A ≡<sup>Σ</sup> B iff A[σ∗] ≡<sup>Σ</sup><sup>∗</sup> B[σ∗] where σ∗(α) = A<sup>α</sup> for all α ∈ V.

Proof (Sketch). The direct implication is trivial since σ<sup>∗</sup> is a closed substitution. Reciprocally, we assume that ∀V. A ≡<sup>Σ</sup> B. Then there must exist some substitution σ such that A[σ ] ≡<sup>Σ</sup> B[σ ]. We use this constraint to and prove that A[σ∗] ≡<sup>Σ</sup><sup>∗</sup> B[σ∗]. The exact details can be found in our tech report [14].

**Theorem 3.** Checking ∀V. A ≡ B is decidable.

Proof. Theorem 2 reduces equality of open types to equality of closed types. Theorem 1 reduces equality of closed nested session types to trace equivalence of first-order grammars. Janˇcar [30] proved that trace equivalence for first-order grammars is decidable, hence establishing the decidability of equality for nested session types.

# **5 Practical Algorithm for Type Equality**

Although type equality can be reduced to trace equivalence for first-order grammars (Theorem 1 and Theorem 2), the latter problem has a very high theoretical complexity with no known practical algorithm [30]. In response, we have designed a coinductive algorithm for approximating type equality. Taking inspiration from Gay and Hole [23], we attempt to construct a bisimulation. Our proposed algorithm is sound but incomplete and can terminate in three states: (i) types are proved equal by constructing a bisimulation, (ii) counterexample detected by identifying a position where types differ, or (iii) terminated without a conclusive answer due to incompleteness. We interpret both (ii) and (iii) as a failure of type-checking (but there is a recourse; see Section 5.1). The algorithm

V ; Γ A ≡ B (∀ ∈ L) V ; Γ ⊕{ : A}∈<sup>L</sup> ≡ ⊕{ : B}∈<sup>L</sup> ⊕ V ; Γ A ≡ B (∀ ∈ L) <sup>V</sup> ; <sup>Γ</sup> { : <sup>A</sup>}∈<sup>L</sup> <sup>≡</sup> { : <sup>B</sup>}∈<sup>L</sup> V ; Γ A<sup>1</sup> ≡ B<sup>1</sup> V ; Γ A<sup>2</sup> ≡ B<sup>2</sup> V ; Γ A<sup>1</sup> ⊗ A<sup>2</sup> ≡ B<sup>1</sup> ⊗ B<sup>2</sup> ⊗ V ; Γ A<sup>1</sup> ≡ B<sup>1</sup> V ; Γ A<sup>2</sup> ≡ B<sup>2</sup> <sup>V</sup> ; <sup>Γ</sup> <sup>A</sup><sup>1</sup> <sup>A</sup><sup>2</sup> <sup>≡</sup> <sup>B</sup><sup>1</sup> <sup>B</sup><sup>2</sup> <sup>V</sup> ; <sup>Γ</sup> **<sup>1</sup>** <sup>≡</sup> **<sup>1</sup> <sup>1</sup>** <sup>α</sup> ∈ V <sup>V</sup> ; <sup>Γ</sup> <sup>α</sup> <sup>≡</sup> <sup>α</sup> var <sup>V</sup> ; <sup>Γ</sup> <sup>A</sup> <sup>≡</sup> <sup>A</sup>- V ; Γ V [A] ≡ V [A-] refl V1[α1] = A ∈ Σ V2[α2] = B ∈ Σ C = V ; V1[A1] ≡ V2[A2] V ; Γ, C <sup>Σ</sup> A[A1/α1] ≡ B[A2/α2] <sup>V</sup> ; <sup>Γ</sup> <sup>Σ</sup> <sup>V</sup>1[A1] <sup>≡</sup> <sup>V</sup>2[A2] expd V- ; V1[A- <sup>1</sup>] ≡ V2[A- <sup>2</sup>] ∈ Γ ∃σ- : V- . - <sup>V</sup> ; <sup>Γ</sup> <sup>V</sup>1[A- 1[σ-]] <sup>≡</sup> <sup>V</sup>1[A1] ∧ V ; <sup>Γ</sup> <sup>V</sup>2[A- 2[σ-]] ≡ V2[A2] V ; Γ V1[A1] ≡ V2[A2] def

is deterministic (no backtracking) and the implementation is quite efficient in practice. For all our examples, type checking is instantaneous (see Section 8).

The fundamental operation in the equality algorithm is loop detection where we determine if we have already added an equation A ≡ B to the bisimulation we are constructing. Due to the presence of open types with free type variables, determining if we have considered an equation already becomes a difficult operation. To that purpose, we make an initial pass over the given types and introduce fresh internal names as described in Example 1 (but also for **1** and α for simplicity). In the resulting signature defined type names and structural types alternate and we can perform loop detection entirely on defined type names (whether internal or external). The formal rules for this internal renaming are described in the technical report [14].

Based on the invariants established by internal names, the algorithm only needs to alternately compare two type names or two structural types. The rules are shown in Figure 1. The judgment has the form V ; Γ <sup>Σ</sup> A ≡ B where V contains the free type variables in the types A and B, Σ is a fixed valid signature containing type definitions of the form V [α] = C, and Γ is a collection of closures V ; V1[A1] ≡ V2[A2]. If a derivation can be constructed, all closed instances of all closures are included in the resulting bisimulation (see the proof of Theorem 4). A closed instance of closure V ; V1[A1] ≡ V2[A2] is obtained by applying a closed substitution σ over variables in V, i.e., V1[A1[σ]] ≡ V2[A2[σ]] such that the types V1[A1[σ]] and V2[A2[σ]] have no free type variables. Because the signature Σ is fixed, we elide it from the rules in Figure 1.

In the type equality algorithm, the rules for type operators simply compare the components. If the type constructors (or the label sets in the <sup>⊕</sup> and rules) do not match, then type equality fails having constructed a counterexample to bisimulation. Similarly, two type variables are considered equal iff they have the same name, as exemplified by the var rule.

The rule of reflexivity is needed explicitly here (but not in the version of Gay and Hole) due to the incompleteness of the algorithm: we may otherwise fail to recognize type names parameterized with equal types as equal. Note that the refl rule checks a sequence of types in the premise.

Now we come to the key rules, expd and def. In the expd rule we expand the definitions of V1[A1] and V2[A2], and add the closure V ; V1[A1] ≡ V2[A2] to Γ. Since the equality of V1[A1] and V2[A2] must hold for all its closed instances, the extension of Γ with the corresponding closure remembers exactly that.

The def rule only applies when there already exists a closure in Γ with the same type names V<sup>1</sup> and V2. In that case, we try to find a substitution σ over V such that V1[A1] is equal to V1[A 1[σ ]] and V2[A2] is equal to V2[A 2[σ ]]. Immediately after, the refl rule applies and recursively calls the equality algorithm on both type parameters. The substitution σ is computed by a standard matching algorithm on first-order terms (which is linear-time), applied on the syntactic structure of the types. Existence of such a substitution ensures that any closed instance of V ; V1[A1] ≡ V2[A2] is also a closed instance of V ; V1[A <sup>1</sup>] ≡ V2[A <sup>2</sup>], which are already present in the constructed type bisimulation, and we can terminate our equality check, having successfully detected a loop.

The algorithm so far is sound, but potentially non-terminating. There are two points of non-termination: (i) when encountering name/name equations, we can use the expd rule indefinitely, and (ii) we call the type equality recursively in the def rule. To ensure termination in the former case, we restrict the expd rule so that for any pair of type names V<sup>1</sup> and V<sup>2</sup> there is an upper bound on the number of closures of the form − ; V1[−] ≡ V2[−] allowed in Γ. We define this upper bound as the depth bound of the algorithm and allow the programmer to specify this depth bound. Surprisingly, a depth bound of 1 suffices for all of our examples. In the latter case, instead of calling the general type equality algorithm, we introduce the notion of rigid equality, denoted by <sup>V</sup> ; <sup>Γ</sup> <sup>A</sup> <sup>≡</sup> <sup>B</sup>. The only difference between general and rigid equality is that we cannot employ the expd rule for rigid equality. Since the size of the types reduce in all equality rules except for expd, this algorithm terminates. When comparing two instantiated type names, our algorithm first tries reflexivity, then tries to close a loop with def, and only if neither of these is applicable or fails do we expand the definitions with the expd rule. Note that if type names have no parameters, our algorithm specializes to Gay and Hole's (with the small optimizations of reflexivity and internal naming), which means our algorithm is sound and complete on monomorphic types.

*Soundness.* We establish the soundness of the equality algorithm by constructing a type bisimulation from a derivation of V ; Γ A ≡ B by (i) collecting the conclusions of all the sequents, and (ii) forming all closed instances from them. **Definition 5.** Given a derivation D of V ; Γ A ≡ B, we define the set S(D) of closures. For each sequent (regular or rigid) of the form V ; Γ A ≡ B in D, we include the closure V ; A ≡ B in S(D).

**Theorem 4 (Soundness).** If V ; · A ≡ B, then ∀V. A ≡ B. Consequently, if V is empty, we get A ≡ B.

Proof. Given a derivation D<sup>0</sup> of V<sup>0</sup> ; · A<sup>0</sup> ≡ B0, construct S(D0) and define relation R<sup>0</sup> as follows:

$$\mathcal{R}\_0 = \{ (A[\sigma], B[\sigma]) \mid \langle \mathcal{V} \,;\, A \equiv B \rangle \in \mathcal{S}(\mathcal{D}\_0) \text{ and } \sigma \text{ over } \mathcal{V} \} $$

Then, construct R<sup>i</sup> (i ≥ 1) as follows:

$$\mathcal{R}\_i = \{ (V[\overline{A}], V[\overline{B}]) \mid V[\overline{\alpha}] = C \in \Sigma \text{ and } (A^j, B^j) \in \mathcal{R}\_{i-1} \,\forall j \in 1..|\overline{\alpha}| \}$$

Consider R to be the reflexive transitive closure of - <sup>i</sup>≥<sup>0</sup> <sup>R</sup>i. Note that extending a relation by its reflexive transitive closure preserves its bisimulation properties since the bisimulation is strong. If R is a type bisimulation, then our theorem follows since the closure V<sup>0</sup> ; A<sup>0</sup> ≡ B0∈S(D0), and hence, for any closed substitution σ, (A0[σ], B0[σ]) ∈ R.

All that remains is to prove that R is a type bisimulation. We achive this via a case analysis on the rule that added a pair (A, B) to R. The complete proof is described in the technical report [14].

### **5.1 Type Equality Declarations**

One of the primary sources of incompleteness in our algorithm is its inability to generalize the coinductive hypothesis. As an illustration, consider the following two types D and D , which only differ in the names, but have the same structure.

$$T[x] \triangleq \oplus \{ \mathbf{L} : T[T[x]], \mathbf{R} : x \} \qquad \qquad D \triangleq \oplus \{ \mathbf{L} : T[D], \ $ : \mathbf{1} \}$ 
$$T'[x] \triangleq \oplus \{ \mathbf{L} : T'[T'[x]], \mathbf{R} : x \} \qquad \qquad D' \triangleq \oplus \{ \mathbf{L} : T'[D'], \$$
 : \mathbf{1} \}$$

To establish D ≡ D , our algorithm explores the **L** branch and checks T[D] ≡ T [D ]. A corresponding closure · ; T[D] ≡ T [D ] is added to Γ, and our algorithm then checks T[T[D]] ≡ T [T [D ]]. This process repeats until it exceeds the depth bound and terminates with an inconclusive answer. What the algorithm never realizes is that T[x] ≡ T [x] for all x ∈ Type; it fails to generalize to this hypothesis and is always inserting closed equality constraints to Γ.

To allow a recourse, we permit the programmer to declare (concrete syntax)

eqtype T[x] = T'[x]

an equality constraint easily verified by our algorithm. We then seed the Γ in the equality algorithm with the corresponding closure from the eqtype constraint which can then be used to establish D ≡ D

$$\cdot \; ; \; \langle x \; ; \; T[x] \equiv T'[x] \rangle \vdash D \equiv D'$$

which, upon exploring the **L** branch reduces to

$$\cdot \; ; \; \langle x \; ; \; T[x] \equiv T'[x] \rangle , \; \langle \cdot \; ; \; D \equiv D' \rangle \vdash T[D] \equiv T'[D'] \; ;$$

which holds under the substitution [D/x] as required by the def rule.

In the implementation, we first collect all the eqtype declarations in the program into a global set of closures Γ0. We then validate every eqtype declaration by checking V ; Γ<sup>0</sup> A ≡ B for every pair (A, B) (with free variables V) in the eqtype declarations. Essentially, this ensures that all equality declarations are valid w.r.t. each other. Finally, all equality checks are then performed under this more general Γ0. The soundness of this approach can be proved with the following more general theorem.

**Theorem 5 (Seeded Soundness).** For a valid set of eqtype declarations Γ0, if V ; Γ<sup>0</sup> A ≡ B, then ∀V. A ≡ B.

Our soundness proof can easily be modified to accommodate this requirement. Intuitively, since Γ<sup>0</sup> is valid, all closed instances of Γ<sup>0</sup> are already proven to be bisimilar. Thus, all properties of a type bisimulation are still preserved if all closed instances of Γ<sup>0</sup> are added to it.

One final note on the rule of reflexivity: a type name may not actually depend on its parameter. As a simple example, we have V [α] = **1**; a more complicated one would be V [α] = ⊕{a : V [V [α]], b : **1**}. When applying reflexivity, we would like to conclude that V [A] ≡ V [B] regardless of A and B. This could be easily established with an equality type declaration eqtype V [α] = V [β]. In order to avoid this syntactic overhead for the programmer, we determine for each parameter α of each type name V whether its definition is nonvariant in α. This information is recorded in the signature and used when applying the reflexivity rule by ignoring nonvariant arguments.

# **6 Formal Language Description**

In this section, we present the program constructs we have designed to realize nested polymorphism which have also been integrated with the Rast language [17,18,19] to support general-purpose programming. The underlying base system of session types is derived from a Curry-Howard interpretation [7,8] of intuitionistic linear logic [25]. The key idea is that an intuitionistic linear sequent A<sup>1</sup> A<sup>2</sup> ... A<sup>n</sup> A is interpreted as the interface to a process P. We label each of the antecedents with a channel name x<sup>i</sup> and the succedent with channel name z. The xi's are channels used by P and z is the channel provided by P.

$$((x\_1:A\_1)\ (x\_2:A\_2)\dots(x\_n:A\_n)\vdash P::(z:C))$$

The resulting judgment formally states that process P provides a service of session type C along channel z, while using the services of session types A1,...,A<sup>n</sup> provided along channels x1,...,x<sup>n</sup> respectively. All these channels must be distinct. We abbreviate the antecedent of the sequent by Δ.

Due to the presence of type variables, the formal typing judgment is extended with V and written as


**Table 1.** Session types with operational description

$$\mathcal{V} \; ; \; \Delta \vdash\_{\Sigma} P :: (x : A).$$

where V stores the type variables α, Δ represents the linear antecedents x<sup>i</sup> : Ai, P is the process expression and x : A is the linear succedent. We propose and maintain that all free type variables in Δ, P, and A are contained in V. Finally, Σ is a fixed valid signature containing type and process definitions. Table 1 overviews the session types, their associated process terms, their continuation (both in types and terms) and operational description. For each type, the first line describes the provider's viewpoint, while the second line describes the client's matching but dual viewpoint.

We formalize the operational semantics as a system of multiset rewriting rules [9]. We introduce semantic objects proc(c, P) and msg(c, M) which mean that process P or message M provide along channel c. A process configuration is a multiset of such objects, where any two provided channels are distinct.

### **6.1 Basic Session Types**

We briefly review the structural types already existing in the Rast language. The internal choice type constructor ⊕{ : A}∈<sup>L</sup> is an n-ary labeled generalization of the additive disjunction A ⊕ B. Operationally, it requires the provider of x : ⊕{ : A}∈<sup>L</sup> to send a label label k ∈ L on channel x and continue to provide type Ak. The corresponding process term is written as (x.k ; P) where the continuation P provides type x : Ak. Dually, the client must branch based on the label received on x using the process term case x ( ⇒ Q)∈<sup>L</sup> where Q is the continuation in the -th branch.

$$\frac{(k \in L) \quad \mathcal{V} \; ; \; \Delta \vdash P :: (x : A\_k)}{\mathcal{V} \; ; \; \Delta \vdash (x.k \; ; \; P) :: (x : \oplus \{\ell : A\_\ell\}\_{\ell \in L})} \; \oplus R$$

$$\frac{(\forall \ell \in L) \quad \mathcal{V} \; ; \; \Delta , (x : A\_\ell) \vdash Q\_\ell :: (z : C)}{\mathcal{V} \; ; \; \Delta , (x : \oplus \{\ell : A\_\ell\}\_{\ell \in L}) \vdash \mathsf{case} \; x \; (\ell \Rightarrow Q\_\ell)\_{\ell \in L} :: (z : C)} \; \oplus L$$

Communication is asynchronous, so that the client (c.k ; Q) sends a message k along c and continues as Q without waiting for it to be received. As a technical device to ensure that consecutive messages on a channel arrive in order, the sender also creates a fresh continuation channel c so that the message k is actually represented as (c.k ; c ↔ c ) (read: send k along c and continue along c ). When the message k is received along c, we select branch k and also substitute the continuation channel c for c.

 $(\oplus S) : \mathsf{proc}(c, c.k \; ; \; P) \mapsto \mathsf{proc}(c', P[c'/c]), \mathsf{msg}(c, c.k \; ; \; c \leftrightarrow c')$  $(\oplus C) : \mathsf{msg}(c, c.k \; ; \; c \leftrightarrow c'), \mathsf{proc}(d, \mathsf{case} \; c \; (\ell \Rightarrow Q\_{\ell})\_{\ell \in L}) \mapsto \mathsf{proc}(d, Q\_{k}[c'/c])$ 

The external choice constructor { : <sup>A</sup>}∈<sup>L</sup> generalizes additive conjunction and is the dual of internal choice reversing the role of the provider and client. The corresponding rules for statics and dynamics are skipped for brevity and presented in the technical report [14].

The tensor operator A ⊗ B prescribes that the provider of x : A ⊗ B sends a channel, say w of type A and continues to provide type B. The corresponding process term is send x w ; P where P is the continuation. Correspondingly, its client must receive a channel on x using the term y ← recv x ; Q, binding it to variable y and continuing to execute Q.

$$\begin{array}{c} \mathcal{V} \; ; \; \Delta \vdash P :: (x : B) \\ \hline \mathcal{V} \; ; \; \Delta, (y : A) \vdash (\mathsf{send} \; x \; y \; ; \; P) :: (x : A \otimes B) \\\\ \hline \mathcal{V} \; ; \; \Delta, (y : A), (x : B) \vdash Q :: (z : C) \\ \hline \mathcal{V} \; ; \; \Delta, (x : A \otimes B) \vdash (y \leftarrow \mathsf{recv} \; x \; ; \; Q) :: (z : C) \end{array} \otimes R$$

Operationally, the provider (send c d ; P) sends the channel d and the continuation channel c along c as a message and continues with executing P. The client receives channel d and continuation channel c appropriately substituting them.

(⊗S) : proc(c,send c d ; P) → proc(c , P[c /c]), msg(c,send c d ; c ↔ c ) (⊗C) : msg(c,send c d ; c ↔ c ), proc(e, x ← recv c ; Q) → proc(e, Q[c , d/c, x])

The dual operator A B allows the provider to receive a channel of type A and continue to provide type B. The client of A B, on the other hand, sends the channel of type A and continues to use B using dual process terms as ⊗.

The type **1** indicates termination requiring that the provider of x : **1** send a close message, formally written as close x followed by terminating the communication. Correspondingly, the client of x : **1** uses the term wait x ; Q to wait for x to terminate before continuing with executing Q.

A forwarding process x ↔ y identifies the channels x and y so that any further communication along either x or y will be along the unified channel. Its typing rule corresponds to the logical rule of identity.

$$\overline{\mathcal{V} \; ; \; y: A \vdash (x \leftrightarrow y) :: (x:A)} \quad \mathsf{id}$$

Operationally, a process c ↔ d forwards any message M that arrives on d to c and vice-versa. Since channels are used linearly, the forwarding process can then terminate, ensuring proper renaming, as exemplified in the rules below.

(id+C) : msg(d , M), proc(c, c ↔ d) → msg(c, M[c/d]) (id−C) : proc(c, c ↔ d), msg(e, M(c)) → msg(e, M(c)[d/c])

We write M(c) to indicate that c must occur in message M ensuring that M is the sole client of c.

*Process Definitions* Process definitions have the form Δ f[α] = P :: (x : A) where f is the name of the process and P its definition, with Δ being the channels used by f and x : A being the offered channel. In addition, α is a sequence of type variables that Δ, P and A can refer to. These type variables are implicitly universally quantified at the outermost level and represent prenex polymorphism. All definitions are collected in the fixed global signature Σ. For a valid signature, we require that α ; Δ P :: (x : A) for every definition, thereby allowing definitions to be mutually recursive. A new instance of a defined process f can be spawned with the expression x ← f[A] y ; Q where y is a sequence of channels matching the antecedents Δ and A is a sequence of types matching the type variables α. The newly spawned process will use all variables in y and provide x to the continuation Q.

$$\begin{array}{c} \overline{y':B'} \vdash f[\overline{\alpha}] = P\_f :: (x':B) \in \Sigma\\ \Delta' = \overline{(y:B')[\overline{A}/\overline{\alpha}]} \quad \mathcal{V} \mathrel{\mathop{:}} \quad \Delta, (x:B[\overline{A}/\overline{\alpha}]) \vdash Q :: (z:C)\\ \hline \mathcal{V} \mathrel{\mathop{:}} \quad \Delta, \Delta' \vdash (x \leftarrow f[\overline{A}] \; \overline{y} \; ; \; Q) :: (z:C) \end{array} \text{def}$$

The declaration of f is looked up in the signature Σ (first premise), and A is substituted for α while matching the types in Δ and y (second premise). Similarly, the freshly created channel x has type A from the signature with A substituted for α.

The complete set of rules for the type system and the operational semantics for our language are presented in [14].

### **6.2 Type Safety**

The extension of session types with nested polymorphism is proved type safe by the standard theorems of preservation and progress, also known as session fidelity and deadlock freedom. At runtime, a program is represented using a multiset of semantic objects denoting processes and messages defined as a configuration.

$$\mathcal{S} \text{ ::= } \cdot \mid \mathcal{S}, \mathcal{S}' \mid \mathbf{proc}(c, P) \mid \mathfrak{msg}(c, M).$$

We say that proc(c, P) (or msg(c, M)) provide channel c. We stipulate that no two distinct semantic objects in a configuration provide the same channel.

*Type Preservation* The key to preservation is defining the rules to type a configuration. We define a well-typed configuration using the judgment Δ<sup>1</sup> <sup>Σ</sup> S :: Δ<sup>2</sup> denoting that configuration S uses channels Δ<sup>1</sup> and provides channels Δ2. A configuration is always typed w.r.t. a valid signature Σ. Since the signature Σ is fixed, we elide it from the presentation.

<sup>Δ</sup> (·) :: <sup>Δ</sup> emp <sup>Δ</sup><sup>1</sup> <sup>S</sup><sup>1</sup> :: <sup>Δ</sup><sup>2</sup> <sup>Δ</sup><sup>2</sup> <sup>S</sup><sup>2</sup> :: <sup>Δ</sup><sup>3</sup> <sup>Δ</sup><sup>1</sup> (S1, <sup>S</sup>2) :: <sup>Δ</sup><sup>3</sup> comp · ; Δ P :: (x : A) <sup>Δ</sup> proc(x, P) :: (<sup>x</sup> : <sup>A</sup>) proc · ; <sup>Δ</sup> <sup>M</sup> :: (<sup>x</sup> : <sup>A</sup>) <sup>Δ</sup> msg(x, M) :: (<sup>x</sup> : <sup>A</sup>) msg

**Fig. 2.** Typing rules for a configuration

The rules for typing a configuration are defined in Figure 2. The emp rule states that an empty configuration does not consume any channels provides all channels it uses. The comp rule composes two configurations S<sup>1</sup> and S2; S<sup>1</sup> provides channels Δ<sup>2</sup> while S<sup>2</sup> uses channels Δ2. The rule proc creates a singleton configuration out of a process. Since configurations are runtime objects, they do not refer to any free variables and V is empty. The msg rule is analogous.

*Global Progress* To state progress, we need to define a poised process [39]. A process proc(c, P) is poised if it is trying to receive a message on c. Dually, a message msg(c, M) is poised if it is sending along c. A configuration is poised if every message or process in the configuration is poised. Intuitively, this represents that the configuration is trying to communicate externally along one of the channels it uses or provides.

**Theorem 6 (Type Safety).** For a well-typed configuration <sup>Δ</sup><sup>1</sup> <sup>Σ</sup> <sup>S</sup> :: <sup>Δ</sup>2,


Proof. Preservation is proved by case analysis on the rules of operational semantics. First, we invert the derivation of the current configuration S and use the premises to assemble a new derivation for S . Progress is proved by induction on the right-to-left typing of S so that either S is empty (and therefore poised) or S = (D, proc(c, P)) or S = (D, msg(c, M)). By the induction hypothesis, either D → D or D is poised. In the former case, S takes a step (since D does). In the latter case, we analyze the cases for P and M, applying multiple steps of inversion to show that in each case either S can take a step or is poised.

# **7 Relationship to Context-Free Session Types**

As ordinarily formulated, session types express communication protocols that can be described by regular languages [45]. In particular, the type structure is necessarily tail recursive. Context-free session types (CFSTs) were introduced by Thiemann and Vascoconcelos [45] as a way to express a class of communication protocols that are not limited to tail recursion. CFSTs express protocols that can be described by single-state, real-time DPDAs that use the empty stack acceptance criterion [1,34].

Despite their name, the essence of CFSTs is not their connection to a particular subset of the (deterministic) context-free languages. Rather, the essence of CFSTs is that session types are enriched to admit a notion of sequential composition. Nested session types are strictly more expressive than CFSTs, in the sense that there exists a proper fragment of nested session types that is closed under a notion of sequential composition. (In keeping with process algebras like ACP [2], we define a sequential composition to be an operation that satisfies the laws of a right-distributive monoid.)

Consider (up to α,β,η-equivalence) the linear, tail functions from types to types with unary type constructors only:

$$\begin{array}{lcl} S,T ::= \hat{\lambda}\alpha.\alpha \mid \hat{\lambda}\alpha.V[S\,\alpha] \mid \hat{\lambda}\alpha.\oplus\{\ell:S\_{\ell}\alpha\}\_{\ell\in L} \mid \hat{\lambda}\alpha.\otimes\{\ell:S\_{\ell}\alpha\}\_{\ell\in L} \\\mid \hat{\lambda}\alpha.A\otimes(S\,\alpha) \mid \hat{\lambda}\alpha.A\multimap(S\,\alpha) \end{array}$$

The linear, tail nature of these functions allows the type α to be thought of as a continuation type for the session. The functions S are closed under function composition, and the identity function, λα. α ˆ , is included in this class of functions. Moreover, because these functions are tail functions, composition rightdistributes over the various logical connectives in the following sense:

$$\begin{aligned} (\hat{\lambda}\alpha.\,V[S\,\alpha])\circ T &= \hat{\lambda}\alpha.\,V[(S\circ T)\,\alpha] \\ (\hat{\lambda}\alpha.\,\oplus \{\ell:S\_{\ell}\,\alpha\}\_{\ell\in L})\circ T &= \hat{\lambda}\alpha.\,\oplus \{\ell:(S\_{\ell}\circ T)\,\alpha\}\_{\ell\in L} \\ (\hat{\lambda}\alpha.\,A\otimes(S\,\alpha))\circ T &= \hat{\lambda}\alpha.\,A\otimes((S\circ T)\,\alpha) \end{aligned}$$

and similarly for and . Together with the monoid laws of function composition, these distributive properties justify defining sequential composition as S; T = S ◦ T.

This suggests that although many details distinguish our work from CF-STs, nested session types cover the essence of sequential composition underlying context-free session types. However, even stating a theorem that every CFST process can be translated into a well-typed process in our system of nested session types is difficult because the two type systems differ in many details: we include <sup>⊗</sup> and as session types, but CFSTs do not; CFSTs use a complex kinding system to incorporate unrestricted session types and combine session types with ordinary function types; the CFST system uses classical typing for session types and a procedure of type normalization, whereas our types are intuitionistic and do not rely on normalization; and the CFST typing rules are based on natural deduction, rather than the sequent calculus. With all of these differences, a formal translation, theorem, and proof would not be very illuminating beyond the essence already described here. Empirically, we can also give analogues of the published examples for CFSTs (see, e.g., the first two examples of Section 9).

Finally, nested session types are strictly more expressive than CFSTs. Recall from Section 2 the language <sup>L</sup><sup>3</sup> <sup>=</sup> {**L**<sup>n</sup>**a R**<sup>n</sup>**<sup>a</sup>** <sup>∪</sup> **<sup>L</sup>**<sup>n</sup>**b R**<sup>n</sup>**<sup>b</sup>** <sup>|</sup> n > <sup>0</sup>}, which can be expressed using nested session types with two type parameters used in an essential way. Moreover, Korenjak and Hopcroft [34] observe that this language

cannot be recognized by a single-state, real-time DPDA that uses empty stack acceptance, and thus, CFSTs cannot express the language L3. More broadly, nested types allow for finitely many states and acceptance by empty stack or final state, while CFSTs only allow a single state and empty stack acceptance.

# **8 Implementation**

We have implemented a prototype for nested session types and integrated it with the open-source Rast system [17]. Rast (Resource-aware session types) is a programming language which implements the intuitionistic version of session types [7] with support for arithmetic refinements [18], ergometric [16] and temporal [15] types for complexity analysis. Our prototype extension is implemented in Standard ML (8011 lines of code) containing a lexer and parser (1214 lines), a type checker (3001 lines) and an interpreter (201 lines) and is well-documented. The prototype is available in the Rast repository [13].

*Syntax* A program contains a series of mutually recursive type and process declarations and definitions, concretely written as

```
type V[x1]...[xk] = A
decl f[x1]...[xk] : (c1 : A1) ... (cn : An) |- (c : A)
proc c <- f[x] c1 ... cn = P
```
Type V [x] is represented in concrete syntax as V[x1]...[xk]. The first line is a type definition, where V is the type name parameterized by type variables x1,...,x<sup>k</sup> and A is its definition. The second line is a process declaration, where f is the process name (parameterized by type variables x1,...,xk), (c<sup>1</sup> : A1)...(c<sup>n</sup> : An) are the used channels and corresponding types, while the offered channel is c of type A. Finally, the last line is a process definition for the same process f defined using the process expression P. We use a hand-written lexer and shift-reduce parser to read an input file and generate the corresponding abstract syntax tree of the program. The reason to use a hand-written parser instead of a parser generator is to anticipate the most common syntax errors that programmers make and respond with the best possible error messages.

Once the program is parsed and its abstract syntax tree is extracted, we perform a validity check on it. This includes checking that type definitions, and process declarations and definitions are closed w.r.t. the type variables in scope. To simplify and improve the efficiency of the type equality algorithm, we also assign internal names to type subexpressions parameterized over their free index variables. These internal names are not visible to the programmer.

*Type Checking and Error Messages* The implementation is carefully designed to produce precise error messages. To that end, we store the extent (source location) information with the abstract syntax tree, and use it to highlight the source of the error. We also follow a bi-directional type checking [40] algorithm reconstructing intermediate types starting with the initial types provided in the declaration. This helps us precisely identify the source of the error. Another particularly helpful technique has been type compression. Whenever the type checker expands a type V [A] defined as V [α] = B to B[A/α] we record a reverse mapping from B[A/α] to V [α]. When printing types for error messages this mapping is consulted, and complex types may be compressed to much simpler forms, greatly aiding readability of error messages.

# **9 More Examples**

*Expression Server* We adapt the example of an arithmetic expression from prior work on context-free session types [45]. The type of the server is defined as

```
type bin = +{ b0 : bin, b1 : bin,$:1}
type tm[K] = +{ const : bin * K,
                add : tm[tm[K]],
                double : tm[K] }
```
The type bin represents a constant binary natural number. A process providing a binary number sends a stream of bits, b0 and b1, starting with the least significant bit and eventually terminated by \$.

An arithmetic term, parameterized by continuation type K can have one of three forms: a constant, the sum of two terms, or the double of a term. Consequently, the type tm[K] ensures that a process providing tm[K] is a well-formed term: it either sends the const label followed by sending a constant binary number of type bin and continues with type K; or it sends the add label and continues with tm[tm[K]], where the two terms denote the two summands; or it sends the double label and continues with tm[K]. In particular, the continuation type tm[tm[K]] in the add branch enforces that the process must send exactly two summands for sums.

As a first illustration, consider two binary constants a and b, and suppose that we want to create the expression a + 2b. We can issue commands to the expression server in a prefix notation to obtain a + 2b, as shown in the following exp[K] process, which is parameterized by a continuation type K.

```
decl exp[K] : (a : bin) (b : bin) (k : K) |- (e : tm[K])
proc e <- exp[K]abk=
 e.add ; e.const ; sendea;% (b:bin) (k:K) |- (e : tm[K])
 e.double ; e.const ; sendeb;% (k:K) |- (e : K)
 e <-> k
```
In prefix notation, a+ 2b would be written + (a) (2 b), which is exactly the form followed by the exp process: The process sends add, followed by const and the number a, followed by double, const, and b. Finally, the process continues at type K by forwarding k to e (intermediate typing contexts on the right).

To evaluate a term, we can define an eval process, parameterized by type K:

decl eval[K] : (t : tm[K]) |- (v : bin \* K)

The eval process uses channel t : tm[K] as argument, and offers v : bin \* K. The process evaluates term t and sends its binary value along v. The technical report contains the full implementation [14].

*Serializing binary trees* Another example from [45] is serializing binary trees. Here we adapt that example to our system. Binary trees can be described by:

```
type Tree[a] = +{ node : Tree[a]*a* Tree[a] , leaf:1}
```
These trees are polymorphic in the type of data stored at each internal node. A tree is either an internal node or a leaf, with the internal nodes storing channels that emit the left subtree, data, and right subtree. Owing to the multiple channels stored at each node, these trees do not exist in a serial form.

We can, however, use a different type to represent serialized trees:

```
type STree[a][K] = +{ nd : STree[a][a * STree[K]] , lf:K}
```
A serialized tree is a stream of node and leaf labels, nd and lf, parameterized by a continuation type K. Like add in the expression server, the label nd continues with type STree[a][a \* STree[K]]: the label nd is followed by the serialized left subtree, which itself continues by sending the data stored at the internal node and then the serialized right subtree, which continues with type K. 3

Using these types, it is relatively straightforward to implement processes that serialize and deserialize such trees. The process serialize can be declared with:

```
decl serialize[a][K] : (t : Tree[a]) (k : K) |- (s : STree[a][K])
```
This process uses channels t and k that hold the tree and continuation, and offers that tree's serialization along channel s. If the tree is only a leaf, then the process forwards to the continuation. Otherwise, if the tree begins with a node, then the serialization begins with nd. A recursive call to serialize serves to serialize the right subtree with the given continuation. A subsequent recursive call serializes the left subtree with the data together with the right subtree's serialization as the new continuation.

It is also possible to implement a process for deserializing trees, but because of space limitations, we will not describe deserialize here.

```
decl deserialize[a][K] : (s : STree[a][K]) |- (tk : Tree[a] * K)
```
*Generalized tries for binary trees* Using nested types in Haskell, prior work [28] describes an implementation of generalized tries that represent mappings on binary trees. Our type system is expressive enough to represent such generalized tries. We can reuse the type Tree[a] of binary trees given above. The type Trie[a][b] describes tries that represent mappings from Tree[a] to type b:

<sup>3</sup> The presence of a \* means that this is not a true serialization because it sends a separate channel along which the data of type a is emitted. But there is no uniform mechanism for serializing polymorphic data, so this is as close to a true serialization as possible. Concrete instances of type Tree with, say, data of base type int could be given a true serialization by "inlining" the data of type int in the serialization.

```
type Trie[a][b] = &{ lookup_leaf:b,
                    lookup_node : Trie[a][a -o Trie[a][b]] }
```
A process for looking up a tree in such tries can be declared by:

```
decl lookup_tree[a][b] : (m : Trie[a][b]) (t : Tree[a]) |- (v : b)
```
To lookup a tree in a trie, first determine whether that tree is a leaf or a node. If the tree is a leaf, then sending lookup\_leaf to the trie will return the value of type b associated with that tree in the trie.

Otherwise, if the tree is a node, then sending lookup\_node to the trie results in a trie of type Trie[a][a -o Trie[a][b]] that represents a mapping from left subtrees to type a -o Trie[a][b]. We then lookup the left subtree in this trie, resulting in a process of type a -o Trie[a][b] to which we send the data stored at our original tree's root. That results in a trie of type Trie[a][b] that represents a mapping from right subtrees to type b. Therefore, we finally lookup the right subtree in this new trie and obtain a result of type b, as desired.

We can define a process that constructs a trie from a function on trees:

```
decl build_trie[a][b] : (f : Tree[a] -o b) |- (m : Trie[a][b])
```
Both lookup\_tree and build\_trie can be seen as analogues to deserialize and serialize, respectively, converting a lower-level representation to a higherlevel representation and vice versa. These types and declarations mean that tries represent total mappings; partial mappings are also possible, at the expense of some additional complexity.

All our examples have been implemented and type checked in the opensource Rast repository [13]. We have also further implemented the standard polymorphic data structures such as lists, stacks and queues.

# **10 Further Related Work**

To our knowledge, our work is the first proposal of polymorphic recursion using nested type definitions in session types. Thiemann and Vasconcelos [45] use polymorphic recursion to update the channel between successive recursive calls but do not allow type constructors or nested types. An algorithm to check type equivalence for the non-polymorphic fragment of context-free session types has been proposed by Almeida et al. [1].

Other forms of polymorphic session types have also been considered in the literature. Gay [24] studies bounded polymorphism associated with branch and choice types in the presence of subtyping. He mentions recursive types (which are used in some examples) as future work, but does not mention parametric type definitions or nested types. Bono and Padovani [4,5] propose (bounded) polymorphism to type the endpoints in copyless message-passing programs inspired by session types, but they do not have nested types. Following Kobayashi's approach [33], Dardha et al. [12] provide an encoding of session types relying on linear and variant types and present an extension to enable parametric and bounded polymorphism (to which recursive types were added separately [11]) but not parametric type definitions nor nested types. Caires et al. [6] and Perez et al. [38] provide behavioral polymorphism and a relational parametricity principle for session types, but without recursive types or type constructors.

Nested session types bear important similarities with first-order cyclic terms, as observed by Janˇcar. Janˇcar [30] proves that the trace equivalence problem of first-order grammars is decidable, following the original ideas by Stirling for the language equality problem in deterministic pushdown automata [43]. These ideas were also reformulated by S´enizergues [41]. Henry and S´enizergues [27] proposed the only practical algorithm to decide the language equivalence problem on deterministic pushdown automata that we are aware of. Preliminary experiments show that such a generic implementation, even if complete in theory, is a poor match for the demands made by our type checker.

# **11 Conclusion**

Nested session types extend binary session types with parameterized type definitions. This extension enables us to express polymorphic data structures just as naturally as in functional languages. The proposed types are able to capture sequences of communication actions described by deterministic context-free languages recognized by deterministic pushdown automata with several states, that accept by empty stack or by final state. In this setting, we show that type equality is decidable. To offset the complexity of type equality, we give a practical type equality algorithm that is sound, efficient, but incomplete.

In the future, we are planning to explore subtyping for nested types. In particular, since the language inclusion problem for simple languages [22] is undecidable, we believe subtyping can be reduced to inclusion and would also be undecidable. Despite this negative result, it would be interesting to design an algorithm to approximate subtyping. That would significantly increase the programs that can be type checked in the system. In another direction, since Rast [17] supports arithmetic refinements for lightweight verification, it would be interesting to explore how refinements interact with polymorphic type parameters, namely in the presence of subtyping. We would also like to explore examples where the current type equality is not adequate. Finally, protocols in distributed algorithms such as consensus or leader election (Raft, Paxos, etc.) depend on unbounded memory and cannot usually be expressed with finite control structure. In future work, we would like to see if these protocols can be expressed with nested session types.

Acknowledgements. Support for this research was provided by the Funda¸c˜ao para a Ciˆencia e a Tecnologia (Portuguese Foundation for Science and Technology) through the Carnegie Mellon Portugal Program – Visiting Faculty Program and through the LASIGE Research Unit, ref. UIDB/00408/2020, and by the National Science Foundation under SaTC Award 1801369, CAREER Award 1845514 and Grant No. 1718276.

# **References**


Institute for Software Technology of The United Nations University, Lisbon, Portugal, March 18-20, 2002, Revised Papers. Lecture Notes in Computer Science, vol. 2757, pp. 439–453. Springer (2002). https://doi.org/10.1007/978-3-540-40007- 3 26


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Coupled Relational Symbolic Execution for Differential Privacy**

Gian Pietro Farina1, Stephen Chong2, and Marco Gaboardi(-)3

<sup>1</sup> University at Buffalo SUNY, Buffalo, USA, gianpiet@buffalo.edu <sup>2</sup> Harvard University, Cambridge, USA, chong@seas.harvard.edu <sup>3</sup> Boston University, Boston, USA, gaboardi@bu.edu

**Abstract.** Differential privacy is a de facto standard in data privacy with applications in the private and public sectors. Most of the techniques that achieve differential privacy are based on a judicious use of randomness. However, reasoning about randomized programs is difficult and error prone. For this reason, several techniques have been recently proposed to support designer in proving programs differentially private or in finding violations to it.

In this work we propose a technique based on symbolic execution for reasoning about differential privacy. Symbolic execution is a classic technique used for testing, counterexample generation and to prove absence of bugs. Here we use symbolic execution to support these tasks specifically for differential privacy. To achieve this goal, we design a relational symbolic execution technique which supports reasoning about probabilistic coupling, a formal notion that has been shown useful to structure proofs of differential privacy. We show how our technique can be used to both verify and find violations to differential privacy.

# **1 Introduction**

Differential Privacy [8] has become a de facto gold standard definition of privacy for statistical analysis. This success is mostly due to the generality of the definition, its robustness and compositionality. However, getting differential privacy right in practice is a hard task. Even privacy experts have released fragile code subject to attacks [13, 17] and published incorrect algorithms [16]. This challenge has motivated the development of techniques to support programmers to show their algorithms differentially private. Among the techniques that have been proposed there are type systems [12,18,20,24,26], methods based on model checking and program analysis [2,15,22,23], and program logics [3,4,21]. Several works have also focused on developing techniques to find violations to differential privacy [2, 5, 6, 23, 27]. Most of these works focus only on either verifying a program differentially private or finding violations. Exceptions are the recent works by Barthe et al. [2] and Wang et al. [23] (developed concurrently to our work) which propose method that can instead address both.

Motivated by this picture, we propose a new technique named Coupled Relational Symbolic Execution (CRSE), which supports proving and finding violation

<sup>©</sup> The Author(s) 2021 N. Yoshida (Ed.): ESOP 2021, LNCS 12648, pp. 207–233, 2021.

https://doi.org/10.1007/978-3-030-72019-3\_8

to differential privacy. Our technique is based on two essential ingredients: relational symbolic execution [10] and approximate probabilistic couplings [3].

**Relational Symbolic Execution.** Symbolic execution is a classic technique used for bug finding, testing and proving. In symbolic execution an evaluator executes the program which consumes symbolic inputs instead of concrete ones. The evaluator follows, potentially, all the execution paths the program could take and collects constraints over the symbolic values, corresponding to these paths. Similarly, in relational symbolic execution [10] (RSE) one is concerned with bug finding, testing, or proving for relational properties. These are properties about two executions of two potentially different programs. RSE executes two potentially different programs in a symbolic fashion and exploits relational assumptions about the inputs or the programs in order to reduce the number of states to analyze. This is effective when the codes of the two programs share some similarities, and when the property under consideration is relational in nature, as in the case of differential privacy.

**Approximate Probabilistic Couplings.** Probabilistic coupling is a proof technique useful to lift a relation over the support of a joint distribution to a relation over the two probability marginals of the joint. This allows one to reason about relations between probability distributions by reasoning about relations on their support, which can be usually done in a symbolic way. In this approach the actual probabilistic reasoning is confined to the soundness of the verification system, rather than being spread everywhere in the verification effort. A relaxation of the notion of coupling, called approximate probabilistic coupling [3, 4], has been designed to reason about differential privacy. This can be seen as a regular probabilistic coupling with some additional parameters describing how close the two probability distribution are.

In this work, we combine these two approaches in a framework called Coupled Relational Symbolic Execution. In this framework, a program is executed in a relational and symbolic way. When some probabilistic primitive is executed, CRSE introduces constraints corresponding to the existence of an approximate probabilistic coupling on the output. These constraints are combined with the constraints on the execution traces generated by symbolically and relationally executing other non-probabilistic commands. These combined constraints can be exploited to reduce the number of states to analyze. When the execution is concluded CRSE checks whether there is a coupling between the two outputs, or whether there is some violation to the coupling. We show the soundness of this approach for both proving and refuting differential privacy. However, for finding violations, one cannot reason only symbolically, and since checking a coupling directly can be computationally expensive, we devise several heuristics which can be used to facilitate this task. Using these techniques, CRSE allows one to verify differential privacy for an interesting class of programs, including programs working on countable input and output domains, and to find violations to programs that are not differentially private.

CRSE is not a replacement for other techniques that have been proposed for the same task, it should be seen as an additional method to put in the set of tools of the privacy developer which provides a high level of generality. Indeed, by being a totally symbolic technique, it can leverage a plethora of current technologies such as SMT solvers, algebraic solvers, and numeric solvers.

Summarizing, the contribution of our work are:


Most of the proofs are omitted here, more details can be found in [9, 11].

# **2 CRSE Informally**

We will introduce CRSE through three examples of programs showing potential errors in implementations of differentially private algorithms. Informally, a randomized function A over a set of databases D is -differential privacy (-DP) if it maps two databases D<sup>1</sup> and D<sup>2</sup> that differ for the data of one single individual (denoted D<sup>1</sup> ∼ D2) to output distributions that are indistinguishable up to some value - usually referred to as the privacy budget. This is formalized by requiring that for every <sup>D</sup><sup>1</sup> <sup>∼</sup> <sup>D</sup><sup>2</sup> and for every <sup>u</sup>: Pr[A(D1) = <sup>u</sup>] <sup>≤</sup> <sup>e</sup>- Pr[A(D2) = u]. The smaller the , the more privacy is guaranteed.


**Input:** ∈ R<sup>+</sup>, x1, x<sup>2</sup> ∈ {true, f alse} **Precondtion:** x<sup>1</sup> = x<sup>2</sup> **Postcondition:** o<sup>1</sup> = o<sup>2</sup> ∧ <sup>c</sup> ≤

1: o ← RR (x) 2: **return** o

Fig. 1: Algorithm 1 is not -DP.

Randomized response with wrong noise. A standard primitive to achieve differential privacy when the data is a single boolean is randomized response [25]. We will use this (simplified) primitive to give an idea of how CRSE works. This primitive can be actually reduced to the primitives that CRSE uses and so it won't be included in later sections. The primitive RRp(b) takes in input <sup>p</sup> <sup>∈</sup> ( <sup>1</sup> <sup>2</sup> , 1) and a boolean b and it outputs b with probability p, and ¯b with

probability 1 − p. By unfolding the definition of differential privacy it is easy to see that this primitive is log[−p/(p−1)]-DP. This is internalized in CRSE thanks to the the existence of an log[−p/(p−1)]-approximate lifting (Definition 2) of the equality relation = between the distributions RRp(b) and RRp(¯b). When CRSE executes line 1, it assumes that o<sup>1</sup> = o<sup>2</sup> and it sets a counter c, representing the privacy budget required by the primitive, to log[<sup>−</sup> - -<sup>−</sup><sup>1</sup> ]. In order to check whether this program is actually -DP it will then try to check whether this set of conditions implies the postcondition Ψ ≡ o<sup>1</sup> = o<sup>2</sup> ∧ <sup>c</sup> ≤ . This implication will fail. Indeed, there are value of , say = 0.7, which give a value of <sup>c</sup> which is actually greater than . This shows that the user may have confused the parameter with the parameter p that the randomized response primitive takes in input. If the user substituted line 1 with the following <sup>p</sup><sup>←</sup> <sup>e</sup> 1+e ;o \$ ←−RRp(x), then CRSE would have considered the following conditions instead: o<sup>1</sup> = o<sup>2</sup> and <sup>c</sup> = log[<sup>−</sup> <sup>p</sup> <sup>p</sup>−<sup>1</sup> ]∧<sup>p</sup> <sup>=</sup> <sup>e</sup> 1+e . These conditions would then imply the postcondition Ψ proving the correcteness of the program.

The intuition behind this proof is that everytime CRSE executes a random assignment of the form o \$ ←−RRp(x), it is allowed to assume that o<sup>1</sup> = o<sup>2</sup> as long as it spends a certain amount of privacy budget, i.e. log[<sup>−</sup> <sup>p</sup> <sup>p</sup>−<sup>1</sup> ]. These assumptions are recorded in a set of constraints which is then used to see if it implies the condition that two output variables are equal and the budget spent does not exceed . As a consequence of the definition of approximate lifting, this implies differential privacy (Lemma 2). If this fails, CRSE will provide a counterexample in the form of values for the inputs x1, x2, , p. Such counterexamples to the postcondition do not necessarily denote a counterxampled to the privacy of the algorithm (as we will see later the logic of couplings which CRSE is based on is not complete w.r.t the differential privacy notion) but only potential candidates, and hence need to be further checked.


Two buggy Sparse Vector implementations. The next two examples are variations of the algorithm above threshold, a component of the sparse vector technique, a classical technique which is still subject of studies for improvement [7,14]. Given a numeric threshold, an array of numeric queries of length n, and a dataset, this algorithm returns the index of the first query whose result exceeds the threshold - and potentially it should also return the value of that query. This should be done in a way that preserves differential privacy. To do this in the right way, a program should add noise to the threshold, even if it is not sensitive data, add noise to each query, compare the values, and return the index of the first query for which this comparison succeed. The noise that is usually added is sampled from the Laplace distribution, one of the main primitive in differential privacy. The analysis of this algorithm is rather complex: it uses the noise on the threshold as a way to pay only once for all the queries that are below the threshold, and the noise on the queries to pay for the first and only query that is above the threshold, if any. Due to this complex analysis [16], this algorithm has been a benchmark for tools for reasoning about differential privacy [2, 3, 26].

Algorithm 2 has a bug making the (whole) program not differentially private, for values of n greater than 4. The program initializes an array of outputs o to all bottom values, and a variable r to n+ 1 which will be used as guard in the main loop. It then adds noise to the threshold, and iterates over all the queries adding noise to their results. If one of the noised-results is above the noisy threshold it saves the value in the array of outputs and updates the value of the guard variable, causing it to exit the main loop. Otherwise it keeps iterating. The bug is returning the value of the noisy query that is above the threshold and not only its index, as done by the instruction in red in line 6 - this is indeed not enough for guaranteeing differential privacy. For n < 5 this program can be shown -differentially private by using the composition property of differential privacy that says that the k-fold composition of -DP programs is k-differentially private (Section 3). However, for n ≥ 5 the more sophisticated analysis we described above fails. The proof principle CRSE will use to try to show this program differentially private is to prove the assertion o<sup>1</sup> = ι =⇒ o<sup>2</sup> = ι ∧ <sup>c</sup> ≤ , for every ι ≤ n - the soundness of this principle has been proved in [3]. That is, CRSE will try to prove the following assertions (which would prove the program -differentially private):

• o<sup>1</sup> = [ˆs1, ⊥,..., ⊥] =⇒ o<sup>2</sup> = [ˆs1, ⊥,..., ⊥] ∧ <sup>c</sup> ≤

$$\bullet \ o\_1 = [\bot, \hat{s}\_1, \dots, \bot] \implies o\_2 = [\bot, \hat{s}\_1, \dots, \bot] \land \epsilon\_c \le \epsilon$$

$$\bullet \ o\_1 = [\bot, \dots, \hat{s}\_1] \implies o\_2 = [\bot, \dots, \hat{s}\_1] \land \epsilon\_c \le \epsilon\_1$$

While proving the first assertion, CRSE will first couple at line 3 the threshold as t ˆ<sup>1</sup> + k<sup>0</sup> = t ˆ2, for k<sup>0</sup> > 1 where 1 is the sensitivity of the queries, which is needed to guarantee that all the query results below the threshold in one run stay below the threshold in the other run, then, it will increase appropriately the privacy budget by k<sup>0</sup> - <sup>2</sup> . As a second step it will couple ˆs<sup>1</sup> + k<sup>1</sup> = ˆs<sup>2</sup> in line 4. Now, the only way for the assertion o<sup>1</sup> = [ˆs1, ⊥, ⊥] =⇒ o<sup>2</sup> = [ˆs1, ⊥, ⊥] to hold, is guaranteeing that both ˆs<sup>1</sup> = ˆs<sup>2</sup> and ˆs<sup>1</sup> ≥ t<sup>1</sup> =⇒ sˆ<sup>2</sup> ≥ t<sup>2</sup> hold. But these two assertions are not consistent with each other because k<sup>0</sup> ≥ 1. That is, the only way, using these coupling rules, to guarantee that the run on the right follows the same branches of the run on the left (this being necessary for proving the postcondition) is to couple the samples ˆs<sup>1</sup> and ˆs<sup>2</sup> so that they are different, this necessarily implying the negation of the postcondition. This would not be the case if we were returning only the index of the query, since we can have that both the queries are above the threshold but return different values. Indeed, by substituting line 7 with o[i] \$ ←− the program can be proven -differentially private. So the refuting principle CRSE will use here is the one that finds a trace on the left run such that the only way the right run can be forced to follow it is by making the output variables different.

A second example with bug of the above threshold algorithm is shown in Figure 3. In this example, in the body of the loop, the test is performed between the noisy threshold and the actual value of the query on the database - that is, we don't add noise to the query. CRSE will use for this example another refuting principle based on reachability. In particular, it will vacuously couple the two thresholds at line 1. That is it will not introduce any relation between t ˆ1, and t ˆ2. CRSE will then search for a trace which is satisfiable in the first run but not in the second one. This translates in an output event which has positive probability on the first run but 0 probability in the second one leading to an unbounded privacy loss, and making the algorithm not -differentially private for all finite . Interestingly this unbounded privacy loss can be achieved with just 2 iterations.

# **3 Preliminaries**

Let A be a denumerable set, a subdistribution over A is a function μ : A → [0, 1] with weight <sup>a</sup>∈<sup>A</sup> <sup>μ</sup>(a) less or equal than 1. We denote the set of subdistributions over A as **sdistr**(A). When a subdistribution has weight equal to 1, then we call it a distribution. We denote the set of distributions over A by **distr**(A). The null subdistribution μ<sup>0</sup> : A → [0, 1] assigns to every element of A mass 0. The Dirac's distribution **unit**(a) : A → [0, 1], defined for a ∈ A as **unit**(a)(x) ≡ 1 if x = a, and **unit**(a)(x) ≡ 0, otherwise. The set of subprobability distributions can be given the structure of a monad, with unit the function **unit**. We have also a function **bind** <sup>≡</sup> λμ.λf.λa. b∈O- μ(b)· f(b)(a) allowing us to compose sub-

distributions (as we compose monads). We will use the notion of -divergence Δ-(μ1, μ2) between two subdistributions μ1, μ<sup>2</sup> ∈ **sdistr**(A) to define approximate coupling, this is defined as:Δ-(μ1, μ2) <sup>≡</sup> sup<sup>E</sup>⊆<sup>O</sup> μ1(E) − exp()· μ2(E) .

Formally, differential privacy is a property of a probabilistic program:

**Definition 1 (Differential Privacy [8]).** Let ≥ 0 and ∼⊆ D×D. A program A : D → **distr**(O) is -differentially private with respect to ∼ iif ∀D ∼ D .∀u ∈ O:

$$\Pr[\mathcal{A}(D) = u] \le e^{\epsilon} \Pr[\mathcal{A}(D') = u],$$

The adjacency relation ∼ over the set of databases D models which pairs of input databases should be indistinguishable to an adversary. In its most classical definition, ∼ relates databases that differ in one record in terms of hamming distance. Differentially private programs can be composed [8]: given programs A<sup>1</sup> and A2, respectively <sup>1</sup> and <sup>2</sup> differentially private, their sequential composition A(D) ≡ A2(A1(D), D) is <sup>1</sup> + 2-differentially private. We say that a function <sup>f</sup> : D → <sup>Z</sup> is <sup>k</sup> sensitive if <sup>|</sup>f(x) <sup>−</sup> <sup>f</sup>(y)| ≤ <sup>k</sup>, for all <sup>x</sup> <sup>∼</sup> <sup>y</sup>. Functions with bounded sensitivity can be made differentially private by adding Laplace noise:

**Lemma 1 (Laplace Mechanism [8]).** Let > 0, and assume that f : D → <sup>Z</sup> is a <sup>k</sup> sensitive function with respect to ∼⊆ D × D. Then the randomized algorithm mapping d to f(D) + ν, where ν is sampled from a discrete version of the Laplace distribution with scale <sup>1</sup> -, is k-differentially private w.r.t to ∼.

The notion of approximate probabilistic coupling is internalized by the notion of approximate lifting [3].

**Definition 2.** Given μ<sup>1</sup> ∈ **distr**(A), μ<sup>2</sup> ∈ **distr**(B), a relation Ψ ⊆ A × B, and <sup>∈</sup> <sup>R</sup>, we say that <sup>μ</sup>1, μ<sup>2</sup> are related by the approximate lifting of <sup>Ψ</sup>, denoted μ1(Ψ)μ2, iff there exists μL, μ<sup>R</sup> ∈ **distr**(A × B) such that: 1) λa. <sup>b</sup> μL(a, b) = μ<sup>1</sup> and λb. <sup>a</sup> μR(a, b) = μ2, 2) {(a, b)|μL(a, b) > 0 ∨ μR(a, b) > 0} ⊆ Ψ, 3) Δ-(μL, μR) ≤ 0.

Approximate lifting satisfies the following fundamental property [3]:

**Lemma 2.** Let μ1, μ<sup>2</sup> ∈ **distr**(A), ≥ 0. Then Δ-(μ1, μ2) <sup>≤</sup> <sup>0</sup> iff <sup>μ</sup>1(=)μ2.

From Lemma 2 we have that an algorithm A is -differentially private w.r.t to <sup>∼</sup> iff <sup>A</sup>(D1)(=)-A(D2) for all D<sup>1</sup> ∼ D2. The next lemma [3], finally, casts the Laplace mechanisms in terms of couplings:

**Lemma 3.** Let L<sup>v</sup>1,b, L<sup>v</sup>2,b two Laplace random variables with mean v1, and v<sup>2</sup> respectively, and scale b. Then

$$L\_{v\_1,b} \left\{ (z\_1, z\_2) \mid z\_1 + k = z\_2 \in \mathbb{Z} \times \mathbb{Z} \right\}^{|k + v\_1 - v\_2| \epsilon} L\_{v\_2, b},$$

for all <sup>k</sup> <sup>∈</sup> <sup>Z</sup>, <sup>≥</sup> <sup>0</sup>.

# **4 Concrete languages**

In this section we sketch the two CRSE concrete languages, the unary one PFOR and the relational one RPFOR. These will be the basis on which we will design our symbolic languages in the next section.

### **4.1 PFOR**

PFOR is a basic FOR-like language with arrays, to represent databases and other data structures, and probabilistic sampling from the Laplace distribution. The full syntax is pretty standard and we fully present it in the extended version [11]. In the following we have a simplified syntax:

$$\mathcal{C} \ni c ::= \mathtt{skip} \mid c; c \mid x \leftarrow e \mid x \xleftarrow{\\$} \mathsf{lap}\_e(e) \mid \mathtt{if} \; e \; \mathtt{then} \; c \; \mathtt{else} \; c \mid \ldots$$

The set of commands C includes assignments, the skip command, sequencing, branching, and (not showed) array assignments and looping construct. Finally, we also include a primitive instruction x \$ ←−lap<sup>e</sup><sup>2</sup> (e1) to model random sampling from the Laplace distribution. Arithmetic expressions e ∈ E are built out of integers, array accesses and lengths, and elements in Xp. The set X<sup>p</sup> contains values denoting random expressions, that is values coming from a random assignment or arithmetic expressions involving such values. We will use capital letters such as X, Y, . . . to range over <sup>X</sup>p. The set of values is V ≡ <sup>Z</sup> ∪ Xp. In Figure 2, we introduce a grammar of constraints for random expressions, where X ranges over <sup>X</sup><sup>p</sup> and n, n1, n<sup>2</sup> <sup>∈</sup> <sup>Z</sup>. The simple constraints in the syntactic categories ra and re record that a random value is either associated with a specific distribution, or that the computation is conditioned on some random expression being greater than 0 or less than or equal than 0. The former constraints, as we will see, come from branching instructions. We treat constraint lists p, p , in Figure 2 as lists of simple constraints and hence, from now on, we will use the infix operators :: and @, respectively, for appending a simple constraint to a constraint and for concatenating two constraints. The symbol [] denotes the empty list of probabilistic constraints. Environments in the set M, or probabilistic memories, map program variables to values in V, and array names to elements in **Array** ≡ - i Vi , so the type of a memory <sup>m</sup> ∈ M is <sup>V</sup> →V∪ <sup>A</sup> <sup>→</sup> **Array**. We will distinguish between probabilistic concrete memories in M and concrete memories in the set <sup>M</sup><sup>c</sup> <sup>≡</sup> <sup>V</sup> <sup>→</sup> <sup>Z</sup>∪<sup>A</sup> <sup>→</sup> - <sup>i</sup> <sup>Z</sup><sup>i</sup> . Probabilistic concrete memories are meant to denote subdistributions over the set of concrete memories Mc.

$$\begin{aligned} ra &::= X \xleftarrow{\\$} \operatorname{lap}\_{n\_2}(n\_1) \\ re &::= n \mid X \mid re \oplus re \\ P \ni p &::= X = re \mid re > 0 \mid \\ re &\le 0 \mid ra \mid p &:: P \mid \mid \end{aligned}$$

Fig. 2: Probabilistic constraints

Expressions in PFOR are given meaning through a big-step evaluation semantics specified by a judgment of the form: m, e, p ↓<sup>c</sup> v, p , where m ∈ M, e ∈ E, p, p ∈ P, v ∈ V. The judgments reads as: expression e reduces to the value v and probabilistic constraints p in an environment m with probabilistic concrete

constraints p. We omit the rules for this judgment here, but we will present similar rules for the symbolic languages in the next section. Commands are given

$$\begin{array}{ll}\textbf{if-false} & \langle m,e,p\rangle \downarrow\_{c}\langle v,p'\rangle & v \in \mathbb{Z} & v \leq 0\\ & \langle m,\textbf{if}\ e\ \textbf{then}\ c\_{1}\ \textbf{else}\ c\_{2},p\rangle \rightarrow\_{c} \langle m,c\_{2},p'\rangle\end{array}$$

$$\begin{array}{ll}\textbf{if-true-prob}\ \frac{\langle m,e,p\rangle \downarrow\_{c}}{\langle m,\textbf{if}\ e\ \textbf{then}\ c\_{1}\ \textbf{else}\ c\_{2},p\rangle \rightarrow\_{c} \langle m,c\_{1},p''\rangle}{\langle m,\textbf{if}\ e\ \textbf{then}\ c\_{1}\ \textbf{else}\ c\_{2},p\rangle \rightarrow\_{c} \langle m,c\_{1},p''\rangle} \\\\ \textbf{lap-ass}\ \frac{\langle m,e,p\_{1}\rangle \downarrow\_{c}\langle n\_{1},p\_{1}\rangle & \langle m,e\_{2},p\_{1}\rangle \downarrow\_{c}\langle n\_{2},p\_{2}\rangle & n\_{2} > 0\\ & X\textbf{fresh}(X\_{p}) & p' \equiv p\_{1}\otimes X = \mathrm{lap}\_{n\_{2}}(n\_{1})\\ & \langle m,x\overset{\ $}{\$ }{\textit{l}}\mathrm{lap}\_{c\_{2}}(e\_{1}),p\rangle \rightarrow\_{c} \langle m[x\leftrightarrow X],\textbf{skip},p'\rangle\\\\ & & \textbf{will-B70D}\ \textbf{else}\ \textbf{else}\ \textbf{else}\ \textbf{else}\end{array}$$

Fig. 3: PFOR selected rules

meaning through a small-step evaluation semantics specified by a judgment of

the form: m, c, p →<sup>c</sup> m , c , p , where m, m ∈ M, c, c ∈ C, p, p ∈ P. The judgment reads as: the probabilistic concrete configuration m, c, p steps in to the probabilistic concrete configuration m , c , p . Figure (3) shows a selection of the rules defining this judgment. Most of the rules are self-explanatory so we only describe the ones which are non standard. Rule **lap-ass** handles the random assignment. It evaluates the mean e<sup>1</sup> and the scale e<sup>2</sup> of the distribution and checks that e<sup>2</sup> actually denotes a positive number. The semantic predicate **fresh** asserts that the first argument is drawn nondeterministically from the second argument and that it was never used before in the computation. Notice that if one of these two expressions reduces to a probabilistic symbolic value the computation halts. Rule **if-true-prob** (and **if-false-prob**) reduces the guard of a branching instruction to a value. If the value is a probabilistic symbolic constraint then it will nondeterministically choose one of the two branches recording the choice made in the list of probabilistic constraints. If instead the value of the guard is a numerical constant it will choose the right branch deterministically using the rules **if-false** and **if-true** (not showed).

We call a probabilistic concrete configuration of the form m, skip, p final. A set of concrete configurations D is called final and we denote it by **Final**(D) if all its concrete configurations are final. We will use this predicate even for sets of sets of concrete configurations with the obvious lifted meaning. As clear from the rules a run of a PFOR program can generate many different final concrete configurations. A different judgment of the form D ⇒<sup>c</sup> D , where D, D ∈ P(M× C × P), and in particular its transitive and reflexive closure ( ⇒<sup>∗</sup> <sup>c</sup> ), will help us in collecting all the possible final configurations stemming from a computation. We have only one rule that defines this judgment:

#### **Sub-distr-step**

$$\frac{\langle m,c,p\rangle \in \otimes \quad \otimes' \equiv \langle \otimes \mid \{\langle m,c,p\rangle\}\rangle \cup \{\langle m',c',p'\rangle \mid \langle m,c,p\rangle \to\_c \langle m',c',p'\rangle\}}{\otimes \Rightarrow\_c \otimes'}$$

Rule **Sub-distr-step** nondeterministically selects a configuration s = m, c, p from D, removes s from it, and adds to D all the configurations s that are reachable from s.

In section 3 we defined the notions of lifting, coupling and differential privacy using subdistributions in the form of functions from a set of atomic events to the interval [0, 1]. The semantics of the languages proposed so far though only deal with subdistributions represented as set of concrete probabilistic configurations. We now show how to map the latter to the former. In Figure 4 we define a translation function (-·; ·**mp**) and, auxiliary functions as well, between a single probabilistic concrete configuration and a subdistribution defined using the **unit**(·)/**bind**(·, ·) constructs. We make use of the constant subdistribution μ<sup>0</sup> which maps every element to mass 0, and is usually referred to as the null subdistribution, also by lap<sup>n</sup><sup>2</sup> (n1)(z) we denote the mass of (discrete version of) the Laplace distribution centered in n<sup>1</sup> with scale n<sup>2</sup> at the point z.

The idea of the translation is that we can transform a probabilistic concrete memory m<sup>s</sup> ∈ M into a distribution over fully concrete memories in M<sup>c</sup> by 216 G. P. Farina et al.

ms; <sup>p</sup> **mp** <sup>=</sup> **bind**(p **<sup>p</sup>**,(λso.**unit**(so(ms)))) [] **<sup>p</sup>** = **unit**([]) <sup>X</sup> <sup>=</sup> re :: <sup>p</sup>- **<sup>p</sup>**<sup>=</sup> **bind**(p- **<sup>p</sup>**, λso.**bind**(re **re** <sup>s</sup><sup>o</sup> , λzo.**unit**(X = z<sup>o</sup> :: so))) re > 0 :: <sup>p</sup>- **<sup>p</sup>** <sup>=</sup> **bind**(p- **<sup>p</sup>**, λso.**bind**(re **re** <sup>s</sup><sup>o</sup> , λzo.if (z<sup>o</sup> > 0) then **unit**(zo) else μ0)) re <sup>≤</sup> 0 :: <sup>p</sup>- **<sup>p</sup>** <sup>=</sup> **bind**(p- **<sup>p</sup>**, λso.**bind**(re **re** <sup>s</sup><sup>o</sup> , λzo.if (z<sup>o</sup> ≤ 0) then **unit**(zo) else μ0)) lap<sup>n</sup><sup>2</sup> (n1) **re** <sup>s</sup> = λz.lap<sup>n</sup><sup>2</sup> (n1)(z) n **re** <sup>s</sup> = **unit**(n) X **re** <sup>s</sup> = **unit**(s(X)) re<sup>1</sup> <sup>⊕</sup> re<sup>2</sup> **re** <sup>s</sup> <sup>=</sup> **bind**(re<sup>1</sup> **re** <sup>s</sup> , λv1.**bind**(re<sup>2</sup> **re** <sup>s</sup> , λv2.**unit**(v<sup>1</sup> ⊕ v2)))

sampling from the distributions of the probabilistic variables defined in m<sup>s</sup> in the order they were declared which is specified by the probabilistic path constraints. To do this we first build a substitution for the probabilistic variable which maps them into integers and then we perform the substitution on ms. Given a set of probabilistic concrete memories we can then turn them in a subdistribution by summing up all the translations of the single probabilistic configurations. Indeed, given two subdistributions μ1, μ<sup>2</sup> defined over the same set we can always define the subdistribution μ<sup>1</sup> + μ<sup>2</sup> by the mapping (μ<sup>1</sup> + μ2)(a) = μ1(a) + μ2(a).

The following Lemma states an equivalence between these two representations of probability subdistributions. The hypothesis of the theorem involve a well-formedness judgment, m p, which has not been specified for lack of space but can be found in the extended version [11], it deals with well-formedness of the probabilistic path constraint p with respect to the concrete probabilistic memory m.

**Lemma 4.** If m p and {m, c, p} ⇒<sup>∗</sup> <sup>c</sup> {m1, skip, p1,...,mn, skip, pn} then **bind**(<sup>m</sup>; <sup>p</sup>*mp*, <sup>c</sup><sup>C</sup>) = <sup>n</sup> i=1 <sup>m</sup>i; <sup>p</sup><sup>i</sup> *mp*

This lemma justifies the following definition for the semantics of a program.

**Definition 3.** The semantics of a program c executed on memory m and probability path constraint <sup>p</sup><sup>0</sup> is <sup>c</sup><sup>C</sup>(m0, p0) <sup>≡</sup> (m,skip,p)∈D <sup>m</sup>; <sup>p</sup> *mp*, when {m, c, p} ⇒<sup>∗</sup> <sup>c</sup> <sup>D</sup>, *Final*(D), and <sup>m</sup><sup>0</sup> <sup>p</sup>0. If <sup>p</sup><sup>0</sup> = [] we write <sup>c</sup><sup>C</sup>(m0).

### **4.2 RPFOR**

In order to be able to reason about differential privacy we will build on top of PFOR a relational language called RPFOR with a relational semantics dealing with pair of traces. Intuitively, an execution of a single RPFOR program represents the execution of two PFOR programs. Inspired by the approach of [19], we extend the grammar of PFOR with a pair constructor ·|· which can be used at the level of values v1|v2, expressions e1|e2, or commands c1|c2, where ci, ei, v<sup>i</sup> for i ∈ {1, 2} are commands, expressions, and values in PFOR. This entails that pairs cannot be nested. This syntactic invariant is preserved by the rules handling the branching instruction. Pair constructs are used to indicate where commands, values, or expressions might be different in the two unary executions represented by a single RPFOR execution. The set of expressions and commands in RPFOR, Er, C<sup>r</sup> are generated by the grammars:

$$\mathcal{E}\_{\mathbf{r}} \ni e\_r ::= v \mid e \mid \langle e\_1 | e\_2 \rangle \qquad \mathcal{C}\_{\mathbf{r}} \ni c\_r ::= x \gets e\_r \mid x \xleftarrow{\\$} \land p\_{e\_r}(e\_r) \mid c \mid \langle c\_1 | c\_2 \rangle$$

where v ∈ Vr, e, e1, e<sup>2</sup> ∈ E, c, c1, c<sup>2</sup> ∈ C. Values can now be also pairs of unary values, that is <sup>V</sup><sup>r</sup> ≡V∪V<sup>2</sup>.

To define the semantics for RPFOR, we first extend memories to allow program variables to map to pairs of integers, and array variables to map to pairs of arrays. In the following we will use the following projection functions '·(<sup>i</sup> for i ∈ {1, 2}, which project, respectively, the first (left) and second (right) elements of a pair construct (i.e., 'c1|c2(<sup>i</sup> = ci, 'e1|e2(<sup>i</sup> = e<sup>i</sup> with 'v(<sup>i</sup> = v when v ∈ V), and are homomorphic for other constructs.

The semantics of expressions in RPFOR is specified through the following judgment m1, m2, e, p1, p2 ↓rc v, p 1, p <sup>2</sup>, where m1, m<sup>2</sup> ∈ M, p1, p2, p 1, p <sup>2</sup> ∈ P, e ∈ Er, v ∈ Vr. Similarly, for commands, we have the following judgment m1, m2, c, p1, p2 →rc m 1, m 2, c , p 1, p <sup>2</sup>. Again, we use the predicate **Final**(·) for configurations m1, m2, c, p1, p2 such that c = skip, and lift the predicate to sets of configurations as well. Intuitively a relational probabilistic concrete configuration m1, m2, c, p1, p2 denotes a pair of probabilistic concrete states, that is a pair of subdistributions over the space of concrete memories. In Figure 5 a selection of the rules defining the judgments is presented. Most of the rules are quite natural. Notice how branching instructions combine both probabilistic and relational nondeterminism.

#### **r-if-conc-conc-true-false** m1, m2, e, p1, p2 ↓rc v, p- 1, p- <sup>2</sup> <sup>v</sup> <sup>1</sup>, <sup>v</sup> <sup>2</sup> <sup>∈</sup> <sup>Z</sup> <sup>v</sup> <sup>1</sup> <sup>&</sup>gt; <sup>0</sup> <sup>v</sup> <sup>2</sup> <sup>≤</sup> <sup>0</sup> m1, m2, if e then c<sup>1</sup> else c2, p1, p2 →rc m1, m2,c<sup>1</sup> <sup>1</sup>|c<sup>2</sup> <sup>2</sup>, p- 1, p- 2

**r-if-prob-prob-true-false** m1, m2, e, p1, p2 ↓rc v, p- 1, p- <sup>2</sup> v <sup>1</sup>, v <sup>2</sup> ∈ X<sup>p</sup> m1, m2, if e then c<sup>1</sup> else c2, p1, p2 →rc m1, m2,c<sup>1</sup> <sup>1</sup>|c<sup>2</sup> <sup>2</sup>, v <sup>1</sup> > 0@p- <sup>1</sup>, v <sup>2</sup> ≤ 0@p- 2 **r-pair-step** {i, j} = {1, 2} m <sup>i</sup>, ci, pi →<sup>c</sup> m- i, c- i, p- i c - <sup>j</sup> = c<sup>j</sup> p- <sup>j</sup> = p<sup>j</sup> m- <sup>j</sup> = m <sup>j</sup> m1, m2,c1|c2, p1, p2 →rc m- 1, m- <sup>2</sup>,c - 1|c - 2, p- 1, p- 2

So, as in the case of PFOR, we collect sets of relational configurations using the judgment R ⇒rc R with R, R ∈ P(M×M×C<sup>r</sup> × P × P), defined by only one rule:

### **SUB-PDISTR-STEP**

$$
\begin{split}
\langle m\_1, m\_2, c, p\_1, p\_2 \rangle \in \mathfrak{A} \\
\mathfrak{R}\_t \equiv \{ \langle m\_1', m\_2', c', p\_1', p\_2' \rangle \mid \langle m\_1, m\_2, c, p\_1, p\_2 \rangle \to\_{\text{rc}} \langle m\_1', m\_2', c', p\_1', p\_2' \rangle \} \\
\mathfrak{R}' \equiv \left( \mathfrak{R} \mid \{ \langle m\_1, m\_2, c, p\_1, p\_2 \rangle \} \right) \cup \mathfrak{R}\_t \\
\hline \\
\mathfrak{R} \Rightarrow\_{\text{rc}} \mathfrak{R}'
\end{split}
$$

This rule picks and remove non deterministically one relational configuration from a set and adds to it all those configurations that are reachable from it. As mentioned before a run of a program in RPFOR corresponds to the execution of two runs the program in PFOR. Before making this precise we extend projection functions to relational configurations in the following way: 'm1, m2, c, p1, p2(<sup>i</sup> = mi, c, pi, for i ∈ {1, 2}. Projection functions extend in the obvious way also to sets of relational configurations. We are now ready to state the following lemma relating the execution in RPFOR to the one in PFOR:

**Lemma 5.** Let i ∈ {1, 2} then R ⇒<sup>∗</sup> rc R iff 'R(<sup>i</sup> ⇒<sup>∗</sup> <sup>c</sup> 'R (i.

# **5 Symbolic languages**

In this section we lift the concrete languages, presented in the previous section, to their symbolic versions (respectively, SPFOR and SRPFOR) by extending them with symbolic values X ∈ X . We use intentionally the same metavariables for symbolic values in X and X<sup>p</sup> since they both represent symbolic values of some sort. However, we assume X<sup>p</sup> ∩ X = ∅ - this is because we want symbolic values in X to denote only unknown sets of integers, rather than sets of probability distributions. So, the meaning of X should then be clear from the context.

### **5.1 SPFOR**

SPFOR expressions extend PFOR expressions with symbolic values X ∈ X Commands in SPFOR are the same as in PFOR but now symbolic values can appear in expressions.

In order to collect constraints on symbolic values we extend configurations with set of constraints over integer values, drawn from the set S (Figure 6a), not to be confused with probabilistic path constraints (Figure 6b). The former express constraints over integer values, for instance parameters of the distributions. In particular constraint expressions include standard arithmetic expressions with values being symbolic or integer constants, and array selection. Probabilistic path constraints now can also contain symbolic integer values. Hence,

$$\begin{array}{lcl} \mathcal{S}\_{\boldsymbol{e}} \ni e ::= n & \boldsymbol{X} \mid i \mid e \oplus e \mid \boldsymbol{e} \mid \boldsymbol{e} \mid \boldsymbol{e} \mid \boldsymbol{e} & \boldsymbol{\leq} \\ & \mathsf{store}(e,e,e) \mid \mathtt{select}(e,e) & \mathsf{sre} ::= n \mid \boldsymbol{X} \mid Y \mid re \oplus re \\ \mathcal{S} \ni s ::= \top \mid e \diamond e \mid s \wedge s \mid \neg s \mid \forall i. s \\ \text{(a) Symbol constraints.} \ \boldsymbol{X} \in \boldsymbol{X}, n \in \mathbf{V}. \quad \text{(b) Prob. constraints. } \ c\_{\boldsymbol{e}} \in \boldsymbol{S}, \boldsymbol{X} \in \boldsymbol{X} \\ \end{array}$$

Fig. 6: Grammar of constraints

probabilistic path constraints now can be symbolic. This is needed to address examples branching on probabilistic values, such as the Above Threshold algorithm we discussed in Section 2.

Memories can now contain symbolic values and we represent arrays in memory as pairs (X, v), where v is a (concrete or symbolic) integer value representing the length of the array, and X is a symbolic value representing the array content. The content of the arrays is kept and refined in the set of constraints by means of the **select**(·, ·) and **store**(·, ·, ·) operations. The semantics of expressions is captured by the judgment (m, e, p, s) ↓SP (v, p , s ) including now a set of constraints over integers. The rules of the judgment are fully described in the extended version [11]. We briefly describe a selection of the rules. Rule **S-P-Op-2** applies when an arithmetic operation has both of its operands that reduce respectively to elements in Xp. Appropriately it updates the set of probabilistic constraints. Rules **S-P-Op-5** instead fires when one of them is an integer and the other is a symbolic value. In this case only the list of symbolic constraints needs to be updated. Finally, in rule **S-P-Op-6** one of the operands reduces to an element in X<sup>p</sup> and the other to an element in X . We only update the list of probabilistic constraints appropriately, as integer constraints cannot contain symbols in Xp.

The semantics of commands of SPFOR is described by small step semantics judgments of the form: (m, c, p, s) →SP (m , c , p , s ), including a set of constraints over integers. We provide a selection of the rules in Figure 7. Rule **S-P-If-sym-true** fires when a branching instruction is to be executed and the guard is reduced to either an integer or a value in X , denoted by the set **V**is. In this case we can proceed with the true branch recording in the set of integer constraints the fact that the guard is greater than 0. Rule **S-P-If-prob-false** handles a branching instruction which has a guard reducing to a value in Xp. In this case we can proceed in both branches, even though here we only show one of the two rules, by recording the conditioning fact on the list of probabilistic constraints. Finally, rule **S-P-Lap-Ass** handles probabilistic assignment. After having reduced both the expression for the mean and the expression for the scale to values we check that those are both either integers or symbolic integers, if that's the case we make sure that the scale is greater than 0 and we add a probabilistic constraints recording the fact that the modified variable now points to a probabilistic symbolic value related to a Laplace distribution.

The semantics of SPFOR has two sources of nondeterminism, from guards which reduce to symbolic values, and from guards which reduce to a probabilistic symbolic value. The collecting semantics of SPFOR, specified by judgments as


### **S-P-Lap-Ass**

$$\begin{array}{c} \{m, e\_a, \bar{p}, s\} \downarrow\_{\text{SP}} \left(v\_a, p', s'\right) \quad \left(m, e\_b, p', s'\right) \downarrow\_{\text{SP}} \left(v\_b, p'', s''\right) \quad X \text{ fresh}(\mathcal{X}\_{\mathbb{P}})\\\ v\_a, v\_b \in \mathbf{V}\_{\text{ls}} \quad \stackrel{\scriptstyle \mathcal{W}}{=} s'' \cup \{v\_b > 0\} \quad p'' = p'' \otimes [X \xleftarrow{\\$} lap\_{vb}(v\_a)]\\\ \hline \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad$$

Fig. 7: SPFOR: Semantics of commands (selected rules)

H ⇒sp H (for sets of configurations H and H ) takes care of both of them. The rule for this judgment form is:

$$\begin{array}{lcl} \texttt{s-p-collection} & \\ \mathcal{O}\_{[s]} \subseteq \mathcal{H} & \mathcal{H}' \equiv \{(m', c', p', s') \mid \exists (m, c, p, s) \in \mathcal{O}\_{[s]} \text{ s.t.} \\ & (m, c, p, s) \rightarrow\_{\text{sp}} (m', c', p', s') \land \texttt{SAT}(s')\} \\ \hline & \mathcal{H} \Rightarrow\_{\text{sp}} \left(\mathcal{H} \backslash \mathcal{O}\_{[s]}\right) \cup \mathcal{H}' \end{array}$$

Unlike in the deterministic case of the rule **Set-Step**, where only one configuration was chosen nondeterministically from the initial set, here we select nondeterministically a (maximal) set of configurations all sharing the same symbolic constraints. The notation D[s] ⊆ H means that D is the maximal subset of con-

figuration in <sup>H</sup> which have <sup>s</sup> as set of constraints. We use <sup>H</sup> <sup>D</sup>[s] ===⇒sp H when we want to make explicit the set of symbolic configurations, D[s], that we are using to make the step. Intuitively, **s-p-collect** starts from a set of configurations and reaches all of those that are reachable from it - all the configurations that have a satisfiable set of constraints and are reachable from one of the original configurations with only one step of the symbolic semantics. Notice that in a set of constraints we can have constraints involving probabilistic symbols, e.g. if the i-th element of an array is associated with a random expression. Nevertheless,

the predicate **SAT**(·) does not need to take into consideration relations involving probabilistic symbolic constraints but only relations involving symbolic values denoting integers. The following lemma of coverage connects PFOR with SPFOR ensuring that a concrete execution is covered by a symbolic one.

**Lemma 6 (Probabilistic Unary Coverage).** If <sup>H</sup> <sup>D</sup>[s] ===⇒sp H and σ |=<sup>I</sup> D[s] then ∃σ , D[s-] ⊆ H such that σ |=<sup>I</sup> D[s-], and σ(D[s]) ⇒<sup>∗</sup> <sup>p</sup> σ (D[s-]).

### **5.2 SRPFOR**

The language presented in this section is the the symbolic extension of the concrete language RPFOR. It can also be seen as the relational extension of SPFOR. The key part of this language's semantics will be the handling of the probabilistic assignment. For that construct we will provide 2 rules instead of one. The first one is the obvious one which carries on a standard symbolic probabilistic assignment. The second one will implement a coupling semantics. The syntax of the SRPFOR, presented in Figure 8, extends the syntax of RPFOR by adding symbolic values. The main change is in the grammar of expressions, while the syntax for commands is almost identical to that of RPFOR.

$$\begin{array}{l} \mathcal{L}\_{rs} \ni e\_{sr} ::= e\_s \mid \langle e\_s | e\_s \rangle \mid e\_{sr} \oplus e\_{sr} \mid a [e\_{sr}] \\ \mathcal{L}\_{rs} \ni c\_{sr} ::= c\_s \mid \langle c\_s | c\_s \rangle \mid c\_{sr}; c\_{sr} \mid x \leftarrow e\_{sr} \mid a [e\_{sr}] \leftarrow e\_{sr} \mid x \xleftarrow{\\$} lap\_{c\_s}(e\_{sr}) \mid \\ \qquad \qquad \qquad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \$$

As in the case of RPFOR, only unary symbolic expressions and commands are admitted in the pairing construct. This invariant is maintained by the semantics rules. As for the other languages, we provide a big-step evaluation semantics for expressions whose judgments are of the form (m1, m2, e, p1, p2, s) ↓SRP (v, p 1, p 2, s ). The only rule defining the judgment ↓SRP is **S-R-P-Lift** and it is presented in the extended version [11]. The rule projects the symbolic relational expression first on the left and evaluates it to a unary symbolic value, potentially updating the probabilistic symbolic constraints and the symbolic constraints. It then does the same projecting the expression on the right but starting from the potentially previously updated constraints. Now, the only case when the value returned is unary is when both the previous evaluation returned equal integers, in all the other cases a pair of values is returned. So, the relational symbolic semantics leverages the unary semantics. For the semantics of commands we use the following evaluation contexts to simplify the exposition:

$$\begin{array}{l} \mathcal{CTX} ::= [\cdot \,] \mid \mathcal{CTX}; c\\ \mathcal{P} ::= \langle \cdot; c | \cdot \rangle \mid \langle \cdot | \cdot; c \rangle \mid \langle \cdot | \cdot \rangle \mid \langle \cdot ; c | \cdot ; c \rangle \end{array}$$

Notice how P gets saturated by pairs of commands. Moreover, we separate commands in two classes. We call synchronizing all the commands in Crs with the following shapes x \$ ←−lap<sup>e</sup><sup>2</sup> (e1), x \$ ←−lap<sup>e</sup><sup>2</sup> (e1)|x \$ ←−lap<sup>e</sup>- <sup>2</sup> (e 1), since they allow synchronization of two runs using coupling rules. We call non synchronizing all the other commands.

**Semantics of non synchronizing commands** We consider judgments of the form (m1, m2, c, p1, p2, s) →SRP (m 1, m 2, c , p 1, p 2, s ) and a selection of the rules is given in Figure 9. An explanation of the rules follows. Rule **s-r-if-prob-probtrue-false** fires when evaluating a branching instruction. In particular, it fires when the guard evaluates on both side to a probabilistic symbolic value. In this case the semantics can continue with the true branch on the left run and

### **s-r-if-prob-prob-true-false**

$$\begin{array}{c} \left(m\_{1}, m\_{2}, e, p\_{1}, p\_{2}, s\right) \downarrow\_{\text{SRP}} \left(v, p\_{1}', p\_{2}', s'\right) \quad \left|v\right|\_{1}, \left|v\right|\_{2} \in \mathcal{X}\_{p} \\\ p\_{1}^{\prime\prime} \equiv p\_{1}^{\prime} \otimes \left[\left|v\right|\_{1} > 0\right] \quad p\_{2}^{\prime\prime} \equiv p\_{2}^{\prime} \otimes \left[\left|v\right|\_{2} \leq 0\right] \\\ \hline \left(m\_{1}, m\_{2}, \mathtt{if}\ e \ \mathtt{then}\ c\_{tt} \ \mathtt{else}\ c\_{\mathcal{J}^{t}}, p\_{1}, p\_{2}, s\right) \rightarrow\_{\text{SRP}} \left(m\_{1}, m\_{2}, \langle\left|c\_{tt}\right|\_{1}|\left|c\_{\mathcal{J}^{t}}\right|\_{2}\rangle, p\_{1}^{\prime\prime}, p\_{2}^{\prime\prime}, s'\right) \end{array}$$

**s-r-if-prob-sym-true-false**

$$\frac{\begin{array}{c} (m\_1, m\_2, e, p\_1, p\_2, s) \downarrow\_{\text{SRP}} \ (v, p'\_1, p'\_2, s')\\ \hline \end{array}}{\begin{array}{c} (m\_1, m\_2, e, p\_1, p\_2, s) \downarrow\_{\text{SRP}} \ (v, p'\_1, p'\_2, s')\\ \hline \end{array}} \begin{array}{c} (v, p'\_1, p'\_2, s')\\ \hline \end{array} \begin{array}{c} (v, p'\_1, p'\_2, s')\\ s'' \equiv s'' \cup \left\{ \lfloor v \rfloor\_2 \le 0 \right\} \end{array}}{\begin{array}{c} (m\_1, m\_2, \text{if } e \text{ then } c\_{tt} \text{ else } c\_{ll'}, p\_1, p\_2, s) \rightarrow\_{\text{SRP}} \left( m\_1, m\_2, c\_{tt}, p'\_1, p'\_2, s''' \right) \end{array}}$$

**s-r-pair-lap-skip**

(m1, x \$ ←−lap<sup>e</sup><sup>b</sup> (ea), p1, s) →SP (m- <sup>1</sup>, skip, p- 1, s- ) (m1, m2,<sup>x</sup> \$ ←−lap<sup>e</sup><sup>b</sup> (ea)|skip, p1, p2, s) →SRP (m- 1, m2,skip|skip, p- 1, p2, s- ) **s-r-pair-lapleft-sync**

<sup>c</sup> ≡ <sup>x</sup> \$ ←−lape- b (e - <sup>a</sup>) P ≡ ·|· (m2, c, p2, s) →SP (m- 2, c- , p- 2, s- ) (m1, m2, <sup>P</sup>(<sup>x</sup> \$ ←−lap<sup>e</sup><sup>b</sup> (ea), c), p1, p2, s) →SRP (m1, m- <sup>2</sup>,<sup>x</sup> \$ ←−lap<sup>e</sup><sup>b</sup> (ea)|c - , p1, p- 2, s- )

**s-r-pair-ctxt-1**

$$\begin{array}{c} x \xleftarrow{\\$} \text{lap}\_{e\_{b}}(e\_{a}) \notin \{c\_{1}, c\_{2}\} \qquad |\{c\_{1}, c\_{2}\}| = 2 \qquad \{1, 2\} = \{i, j\} \\\ c'\_{i} \equiv c\_{i} \qquad p'\_{i} \equiv p\_{i} \\\ m'\_{i} \equiv m\_{i} \qquad (m\_{j}, c\_{j}, p\_{j}, s) \to\_{\text{SP}} \ (m'\_{j}, c'\_{j}, p'\_{j}, s') \\\ \overline{(m\_{1}, m\_{2}, \mathcal{P}(c\_{1}, c\_{2}), p\_{1}, p\_{2}, s) \to\_{\text{SRP}} \ (m'\_{1}, m'\_{2}, \mathcal{P}(c'\_{1}, c'\_{2}), p'\_{1}, p'\_{2}, s')} \end{array}$$

$$\begin{array}{c} \mathbf{s-r-pair-ctxt-2} \\ \hline \mathcal{P} \not\equiv \langle \cdot \mid \cdot \rangle & (m\_1, m\_2, \langle c\_1 | c\_2 \rangle, p\_1, p\_2, s) \rightarrow\_{\text{SRP}} \left( m'\_1, m'\_2, \langle c'\_1 | c'\_2 \rangle, p'\_1, p'\_2, s' \right) \\ \hline (m\_1, m\_2, \mathcal{P}(c\_1, c\_2), p\_1, p\_2, s) \rightarrow\_{\text{SRP}} \left( m'\_1, m'\_2, \mathcal{P}(c'\_1, c'\_2), p'\_1, p'\_2, s' \right) \\ \end{array}$$

)

with the false branch on the right one. Notice that commands are projected to avoid pairing commands appearing in a nested form. Rule **s-r-if-prob-symtrue-false** applies when the guard of a branching instruction evaluates to a probabilistic symbolic value on the left run and a symbolic integer value on the right one. The rule allows to continue on the true branch on the left run and on the false branch on the right one. Notice that in one case the probabilistic list of constraints is updated, while on the other the symbolic set of constraints.

Rule **s-r-pair-lap-skip** handles the pairing command where on the left hand side we have a probabilistic assignment and on the right a skip instruction. In this case, there is no hope for synchronization between the two runs and hence we can just perform the left probabilistic assignment relying on the unary symbolic semantics. Rule **s-r-pair-lapleft-sync** instead applies when on the left we have a probabilistic assignment and on the right we have another arbitrary command. In this case we can hope to reach a situation where on the right run another probabilistic assignment appears. Hence, it makes sense to continue the computation in a unary way on the right side. Again →SRP is a nondeterministic semantics. The nondeterminism comes from the use of probabilistic symbols and symbolic values as guards, and by the relational approach. So, in order to collect all the possible traces stemming from such nondeterminism we define a collecting semantics relating set of configurations to set of configurations.

The semantics is specified through a judgment of the form: SR ⇒srp SR , with SR, SR ∈ P(*M*SP ×*M*SP × Crs × SP × SP × S). The only rule defining the judgment is the following natural lifting of the one for the unary semantics.

### **s-r-p-collect**

$$\begin{array}{c} \mathcal{R}\_{[s]} \subseteq \mathcal{Y}\mathcal{R} \\ \exists (m\_1, m\_2, c, p\_1, p\_2, s) \in \mathcal{R}\_{[s]} \text{ s.t. } (m\_1, m\_2, c, p\_1, p\_2, s) \rightarrow\_{\text{SRP}} (m'\_1, m'\_2, c', p'\_1, p'\_2, s') \\ \hline \land \textsf{SAT}(s') \} \\ \hline \mathcal{Y}\mathcal{R} \Rightarrow\_{\text{spr}} \left( \mathcal{Y}\mathcal{R} \mid \mathcal{R}\_{[s]} \right) \cup \mathcal{Y}\mathcal{R}' \end{array}$$

The rule, and the auxiliary notation R[s], is pretty similar to that of SPFOR, the only difference is that here sets of symbolic relational probabilistic configurations are considered instead of symbolic (unary) probabilistic configurations.

**Semantics of synchronizing commands** We define a new judgment with form G G , with G, G ∈ P(P(*M*SP ×*M*SP × Crs × SP × SP × S)). In Figure 10, we give a selection of the rules. Rule **Proof-Step-No-Sync** applies when no synchronizing commands are involved, and hence there is no possible coupling rule to be applied. In the other rules, we use the variable <sup>c</sup> to symbolically count the privacy budget in the current relational execution. The variable gets increased when the rule **Proof-Step-Lap-Gen** fires. This symbolic counter variable is useful when trying to prove equality of certain variables without spending more than a specific budget. This rule is the one we can use in most cases when we need to reason about couplings on the Laplace distributions. In the set of sets of configurations G, a set of configurations, SR, is nondeterministically chosen. Among elements in SR a configuration is also nondeterministically chosen. Using contexts we check that in the selected configuration the next command to execute is the probabilistic assignment. After reducing to values both the mean and scale expression, and verified (that is, assumed in the set of constraints) that in the two runs the scales have the same value, the rule adds to the set of constraints a new element, that is, E = E + |'va(<sup>1</sup> − 'va(2| · K , where K, K , E are fresh symbols denoting integers and E is the symbolic integer to which the budget variable <sup>c</sup> maps to. Notice that <sup>c</sup> needs to point to the same symbol in both memories. This is because it is a shared variable tracking the privacy budget spent so far in both runs. This new constraint increases the budget spent. The other constraint added is the real coupling relation, that is X<sup>1</sup> + K = X2. Where X1, X<sup>2</sup> are fresh in X . Later, K will be existentially quantified in order to search for a proof of -indistinguishability.

Rule **Proof-Step-Avoc** does not use any coupling rule but treats the samples in a purely symbolic manner. It intuitively asserts that the two samples are

**Proof-Step-No-Sync** SR ∈ G SR ⇒srp SR- G- ≡ G \ {SR} ∪ {SR- } G G-

**Proof-Step-No-Coup**

$$\begin{array}{c} (m\_1, m\_2, \mathcal{CTX}[x \xleftarrow{\ $} lap\_{eb}(e\_a)], p\_1, p\_2, s) \in \mathcal{FR} \in \mathcal{G} \\ (m\_1, m\_2, e\_a, p\_1, p\_2, s) & \downarrow\_{\text{SRP}}(v\_a, p'\_1, p'\_2, s\_a) \\ (m\_1, m\_2, e\_b, p'\_1, p'\_2, s\_a) & \downarrow\_{\text{SRP}}(v\_b, p'\_1, p'\_2, s\_b) \\ X\_1, X\_2 \text{ fresh}(\mathcal{X}\_p) & m\_1 \equiv m\_1[x \mapsto X\_1] & m\_2 \equiv m\_2[x \mapsto X\_2] \\ p\_{1'}^{\prime\prime\prime} \equiv p\_{1'}^{\prime\prime} @ [X\_1 \xleftarrow{\$ } lap\_{\lfloor v\_b \rfloor\_1 \lfloor \lfloor v\_a \rfloor\_1 \rfloor}] & p\_{2'}^{\prime\prime\prime} \equiv p\_{2'}^{\prime\prime} @ [X\_2 \xleftarrow{\ $} lap\_{\lfloor v\_b \rfloor\_2 \lfloor \lfloor v\_a \rfloor\_2 \rfloor}] \\ \mathcal{FR}^{\prime} \equiv \left( \mathcal{FR} \right) \left\{ \left( m\_1, m\_2, \mathcal{CTX}[x \xleftarrow{\$ } lap\_{e\_b}(e\_a)], p\_1, p\_2, s \right) \right\} \cup \\ \{ (m'\_1, m'\_2, \mathcal{CTX}[\mathtt{skip}], p\_1^{\prime\prime\prime}, p\_2^{\prime\prime}, s^{\prime\prime}) \} & \mathcal{H} \equiv \left( \mathcal{H} \left\{ \mathcal{PK} \right\} \right) \cup \{ \mathcal{PK} \} \right\} \\ \mathcal{G} \sim \mathcal{H}' \end{array}$$

#### **Proof-Step-Avoc**

$$\begin{array}{c} (m\_1, m\_2, \mathcal{CTX}[x \xleftarrow{\ $} lap\_e(e\_a)], p\_1, p\_2, s) \in \mathcal{Y}\mathcal{R} \in \mathcal{G} \\ (m\_1, m\_2, e\_a, p\_1, p\_2, s) \downarrow\_{\text{SRP}} \ (v\_a, p'\_1, p'\_2, s\_a) \\ (m\_1, m\_2, e\_b, p'\_1, p'\_2, s\_a) \downarrow\_{\text{SRP}} \ (v\_b, p''\_1, p''\_2, s) \\ X\_1, X\_2 \text{ fresh}(X) \quad m'\_1 \equiv m\_1 [x \to X\_1] \quad m'\_2 \equiv m\_2 [x \to X\_2] \\ \mathcal{G}' \equiv (\mathcal{G} \parallel \{\mathcal{G}\&\}) \cup \{\mathcal{G}\&'\} \\ \mathcal{G}' \cong \{\mathcal{G}\&\} \ \{(m\_1, m\_2, \mathcal{CTX}[x \xleftarrow{\$ } lap\_e(e\_a)], p\_1, p\_2, s)\} \\ \quad \cup \{(m'\_1, m'\_2, \mathcal{CTX}[\mathtt{skip}], p''\_1, p''\_2, s'')\} \\ \quad \quad \overline{\mathcal{G}} \sim \mathcal{G}' \end{array}$$

#### **Proof-Step-Lap-Gen**

$$\begin{array}{c} (m\_1, m\_2, \mathcal{CTX}[x] \stackrel{s}{\to} lap\_{e\_b}(e\_a)], p\_1, p\_2, s\_c) \in \mathcal{GH} \in \mathcal{G} \\ (m\_1, m\_2, e\_a, p\_1, p\_2, s\_c) \downarrow\_{\text{SRP}}(v\_a, p\_1', p\_2', s\_a) \\ (m\_1, m\_2, e\_b, p\_1', p\_2', s\_a) \downarrow\_{\text{SRP}}(v\_b, p\_1', p\_2', s\_b) \\ s' \equiv s\_b \cup \{ [v\_b]\_1 = [v\_b]\_2, [v\_b]\_1 > 0 \} & m\_1(\epsilon\_c) = E' = m\_2'(\epsilon\_c) \\ E'', X\_1, X\_2, K, K' \text{ fresh}(X) \quad m\_1' \equiv m\_1[x \mapsto X\_1][\epsilon\_c \mapsto E''] \\ m\_2' = m\_2[x \mapsto X\_2][\epsilon\_c \mapsto E''] & m(\epsilon) = E \\ s'' \equiv s' \cup \{X\_1 + K = X\_2, K \le K', K' \cdot E = [v\_b]\_1, \\ E'' = E' + [[v\_a]\_1 - [v\_a]\_2] \cdot K' \\ p\_1^{\prime\prime\prime} = p\_1'^{\prime\prime} @ [X\_1 \stackrel{s}{\to} lap\_{[v\_b]\_1}, \{ [v\_a]\_1 \}] & p\_2^{\prime\prime} \equiv p\_2^{\prime\prime} @ [X\_2 \stackrel{s}{\to} lap\_{[v\_b]\_2} \{ [v\_a]\_2 \}] \\ p\_1^{\prime\prime\prime} = p\_1'^{\prime\prime} @ \{ [\mathcal{F}\mathcal{A}\} \} \cup \{ \mathcal{G}\mathcal{H} \} \\ \mathcal{G} \not\equiv \{ \mathcal{G}\} \quad \{ [m\_1, m\_2, \mathcal{CTX}[x \stackrel{s}{\to} lap\_{e\_b}(e\_a)], p\_1, p\_2, s \} \} \\ \mathcal{G$$

Fig. 10: SRPFOR: Proof collecting semantics, selected rules

drawn from the distributions and assigns to them arbitrary integers free to vary on the all domain of the Laplace distribution.

Finally, rule **Proof-Step-No-Coup** applies to synchronizing commands as well. It does not add any relational constraints to the samples. This rules intuitively means that we are not correlating in any way the two samples. Notice that since we are not using any coupling rule we don't need to check that the scale value is the same in the two runs as it is requested in the previous rule. We could think of this as a way to encode the relational semantics of the program in an expression which later can be fed in input to other tools.

The main difference with the previous rule is that here we treat the sampling instruction symbolically and that is why the fresh symbols are in Xp, denoting subdistributions, rather than in X , denoting sampled integers. When the program involves a synchronizing command we basically fork the execution when it is time to execute it. The set of configurations allow us to explore different paths, one for every rule applicable.

# **6 Metatheory**

The coverage lemma can be extended also to the relational setting.

**Lemma 7 (Probabilistic Relational Coverage).** If SR R[s] ===⇒srp SR and σ |=<sup>I</sup> R[s] then ∃σ , R[s-] ∈ SR such that R[s-] ⊆ SR , σ |=<sup>I</sup> R[s-], and σ(R[s]) ⇒<sup>∗</sup> rp σ (R[s-]).

This can also be extended to if we consider only the fragment that only uses the rules **Proof-Step-No-Sync**, and **Proof-Step-No-Coupl**.

The language of relational assertions Φ, Ψ . . . is defined using first order predicate logic formulas involving relational program expressions and logical variables in LogVar. The interpretation of a relational assertions is naturally defined as a subset of M<sup>c</sup> ×Mc, the set of pairs of memories modeling the assertion. We will denote by -·· the substitution function mapping the variables in an assertion to the values they have in a memory (unary or relational). More details are in [10].

**Definition 4.** Let Φ, Ψ be relational assertions, <sup>c</sup> ∈ Cr, <sup>I</sup> : LogVar <sup>→</sup> <sup>R</sup> be an interpretation defined on . We say that, Φ yields Ψ through c within under I (and we write <sup>I</sup> <sup>c</sup> : <sup>Φ</sup> - −→ Ψ) iff


∀m1, m2, skip, p1, p2, s ∈ - <sup>D</sup>∈Hsr <sup>D</sup>. <sup>∃</sup> <sup>k</sup>. <sup>s</sup> <sup>=</sup><sup>⇒</sup> -<sup>Ψ</sup> <sup>∧</sup> <sup>c</sup> <sup>≤</sup> m1|m2 where m<sup>I</sup> ≡ m<sup>I</sup><sup>1</sup> |m<sup>I</sup><sup>2</sup> = m <sup>I</sup><sup>1</sup> [<sup>c</sup> → 0]|m <sup>I</sup><sup>2</sup> [<sup>c</sup> → 0], m <sup>I</sup><sup>1</sup> , and m <sup>I</sup><sup>2</sup> are fully symbolic memories, and k = k1, k2,... are the symbols generated by the rules for synchronizing commands.

The idea of this definition is to make the proof search automated. When proving differential privacy we will usually consider Ψ as being equality of the output variables in the two runs and Φ as being our preconditions. We can now prove the soundness of our approach.

**Lemma 8 (Soundness).** Let c ∈ Cr. If I c : D<sup>1</sup> ∼ D<sup>2</sup> - −→ o<sup>1</sup> = o<sup>2</sup> then c is -differentially private.

We can also prove the soundness of refutations obtained by the semantics.

**Lemma 9 (Soundness for refutation).** Suppose that we have a reduction {{{m1, m2, c, [], [], -<sup>Φ</sup>m1|m2}}} G, and Hs ∈H∈ G and, ∃σ |=<sup>Z</sup> s such that Δ-(-'c(<sup>1</sup><sup>C</sup>(σ(m1)), -'c(<sup>2</sup><sup>C</sup>(σ(m2))) <sup>&</sup>gt; <sup>0</sup> then <sup>c</sup> is not differentially private.

# **7 Strategies for counterexample finding**

Lemma 9 is hard to use to find counterexamples in practice. For this reasons we will now describe three strategies that can help in reducing the effort in counterexample finding. These strategies help in isolating traces that could potentially lead to violations. For this we need first some notation. Given a set of constraints <sup>s</sup> we define the triple <sup>Ω</sup> <sup>=</sup> Ω1, Ω2, C( k) ≡ 's(1, 's(2, s \ ('s(<sup>1</sup> ∪ 's(2). We sometimes abuse notation and consider Ω also as a set of constraints given by the union of its first, second and third projection, and we will also consider a set of constraints as a single proposition given by the conjunction of its elements. The set C( k) contains relational constraints coming from either preconditions or invariants or, from the rule **Proof-Step-Lap-Gen**. The potentially empty vector k = K1,...K<sup>n</sup> is the set of fresh symbols K generated by that rule. In the rest of the paper we will assume the following simplifying assumption.

**Assumption 1** Consider c ∈ C<sup>r</sup> with output variable o, then c is such that {{{m1, m2, c, [], [], s}}} <sup>∗</sup> <sup>G</sup> and <sup>∀</sup>HΩ1, C( k), Ω2∈H∈ G.*Final*(H)∧o<sup>1</sup> = o<sup>2</sup> =⇒ Ω<sup>1</sup> ⇔ Ω2.

This assumption allow us to consider only programs for which it is necessary for the output variable on both runs to assume the same value, that the two runs follow the same branches. That is, if the two output differ then the two executions must have, at some point, taken different branches.

The following definition will be used to distinguish relational traces which are reachable on one run but not on the other. We call these traces orthogonal.

**Definition 5.** A final relational symbolic trace is orthogonal when its set of constraints is such that <sup>∃</sup>σ.σ |<sup>=</sup> <sup>Ω</sup><sup>2</sup> and <sup>σ</sup> <sup>|</sup><sup>=</sup> <sup>Ω</sup><sup>1</sup> <sup>∧</sup>C( k). That is a trace for which <sup>¬</sup>(Ω<sup>1</sup> <sup>∧</sup> <sup>C</sup>( k) =⇒ Ω2) is satisfiable.

The next definition, instead, will be used to isolate relational traces for which it is not possible that the left one is executed but the right one is not. We call these traces specular.

**Definition 6.** A final relational symbolic trace is specular if <sup>∃</sup> k.Ω<sup>1</sup> <sup>∧</sup> <sup>C</sup>( k) =⇒ Ω2.

The constraint <sup>Ω</sup><sup>1</sup> <sup>∧</sup> <sup>C</sup>( k) includes all the constraints coming from the left projection's branching of the symbolic execution and all the relational assumptions such as the adjacency condition, and all constraints added by the potentially fired **Proof-Step-Lap-Gen** rule. A specular trace is such that its left projection constraints plus the relational assumptions imply the right projection constraints. We will now describe our three strategies.

**Strategy A** In this strategy CRSE uses only the rule **Proof-Step-Avoc** for sampling instructions, also this strategy searches for orthogonal relational traces. Under assumption 1, if this happens for a program then it must be the case that the program can output one value on one run with some probability but the same value has 0 probability of being output on the second run. This implies that for some input the program has an unbounded privacy loss. To implement this strategy CRSE looks for orthogonal relational traces m1, m2, skip, p1, p2, Ω such that: <sup>∃</sup>σ.σ <sup>|</sup><sup>=</sup> <sup>Ω</sup><sup>1</sup> <sup>∧</sup> <sup>C</sup>( <sup>k</sup>) but <sup>σ</sup> |<sup>=</sup> <sup>Ω</sup>2. Notice that using this strategy k will always be empty, as the rule used for samplings does not introduce any coupling between the two samples.

**Strategy B** This strategy symbolically executes the program in order to find a specular trace for which no matter how we relate, within the budget, the various pairs of samples X<sup>i</sup> 1, X<sup>i</sup> <sup>2</sup> in the two runs - using the relational schema X<sup>i</sup> <sup>1</sup>+K<sup>i</sup> = X<sup>i</sup> <sup>2</sup> - the postcondition is always false. That is CRSE looks for specular relational traces m1, m2, skip, p1, p2, Ω such that: <sup>∀</sup> k.[(Ω<sup>1</sup> <sup>∧</sup> <sup>C</sup>( k) =⇒ Ω2) ∧ <sup>c</sup> <sup>≤</sup> )m1|m2] =<sup>⇒</sup> <sup>o</sup><sup>1</sup> <sup>=</sup> <sup>o</sup><sup>2</sup>m1|m2.

**Strategy C** This strategy looks for relational traces for which the output variable takes the same value on the two runs but too much of the budget was spent. That is CRSE looks for traces m1, m2, skip, p1, p2, Ω such that: ∀ k.[Ω<sup>1</sup> <sup>∧</sup> <sup>C</sup>( <sup>k</sup>) <sup>∧</sup> <sup>Ω</sup><sup>2</sup> <sup>=</sup><sup>⇒</sup> <sup>o</sup><sup>1</sup> <sup>=</sup> <sup>o</sup><sup>2</sup>m1|m2] =<sup>⇒</sup> <sup>c</sup> > m1|m2.

Of the presented strategies only strategy A is sound with respect to counterexample finding, while the other two apply when the algorithm cannot be proven differentially private by any combination of the rules. In this second case though, CRSE provides counterexamples which agree with other refutation oriented results in literature. This strategies are hence termed useful because they amount to heuristics that can be applied in some situations.

# **8 Examples**

In this section we will review the examples presented in Section 2 and variations thereof to show how CRSE works.

**Unsafe Laplace mechanism: Algorithm 4.** This algorithm is not -d.p because the noise is a constant and it is not calibrated to the sensitivity r of the query q. This translates in any attempt based on coupling rules to use too much of the budget. This program has only one possible final relational trace:

m1, m2, skip, p1, p2,Ω1, C( k, Ω2). Since there are no branching instructions Ω<sup>1</sup> = {'2E(<sup>1</sup> > 0} and Ω<sup>2</sup> = ∅, where m1() = m2() = E. Since there is one sampling instruction C( k) will include {|Q<sup>d</sup>1−Q<sup>d</sup>2| ≤ R,P1+K = P2, E<sup>c</sup> =| K | ·K ·E,O<sup>1</sup> = P<sup>1</sup> +Q<sup>d</sup>1, O<sup>2</sup> = P<sup>2</sup> +Q<sup>d</sup>2}, with m1(o) = O1, m2(o) = O2, m1(c) = m2(c) = Ec, mi(ρi)=Pi. Intuitively, given this set of constraints, if it has to be the case that O<sup>1</sup> = O<sup>2</sup> then, Q<sup>d</sup><sup>1</sup> − Q<sup>d</sup><sup>2</sup> = K. But Q<sup>d</sup><sup>1</sup> − Q<sup>d</sup><sup>2</sup> can be R and hence, E<sup>c</sup> is at least R. So, if we want to equate the two output we need to spend r times the budget. Any relational input satisfying the precondition will give us a counterexample, provided the two projections are different.


4: **return** o

**A safe Laplace mechanism.** By substituting line 2 in Algorithm 4 with ρ \$ ←−lap<sup>r</sup>∗-(0) we get an -DP algorithm. Indeed when executing that line CRSE would generate the following constraint P1+K<sup>0</sup> = P2∧ | K<sup>0</sup> + 0 − 0 |≤ K<sup>1</sup> ∧ O<sup>1</sup> = V<sup>1</sup> + P<sup>1</sup> ∧ O<sup>2</sup> = V<sup>2</sup> + P2. Which by instantiating K = 0, K<sup>1</sup> = V<sup>2</sup> − V<sup>1</sup> implies O<sup>1</sup> = O<sup>2</sup> ∧ E<sup>c</sup> ≤ E.

**Unsafe sparse vector implementation: Algorithm 2.** We already discussed why this algorithm is not -differentially private. Algorithm 2 satisfies Assumption 1 because it outputs the whole array o which takes

values of the form <sup>⊥</sup><sup>i</sup> , t or <sup>⊥</sup><sup>n</sup> for 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup> and <sup>t</sup> <sup>∈</sup> <sup>R</sup>. The array, hence, encodes the whole trace. So if two runs of the algorithm output the same value it must be the case that they followed the same branching instructions. Let's first notice that the algorithm is trivially differentially private, for any , when the number

Fig. 11: Two runs of Alg. 2.

Indeed it is enough to apply the sequential composition theorem and get the obvious bound - <sup>4</sup> · n.

CRSE can prove this by applying the rule **Proof-Step-Lap-Gen** n times, and then choosing K1,...,K<sup>n</sup> all equal to 0. This would imply the statement of equality of the output variables spending less than . A potential counterexample can be found in 5 iterations. If we apply strategy

B to this algorithm and follow the relational symbolic trace that applies the rule **Proof-Step-Lap-Gen** for all the samplings we can isolate the relational specular trace shown in Figure 11, which corresponds to the left execution following the false branch for the first four iterations and then following the true branch and setting the fifth element of the array to the sampled value. Let's denote the respective final relational configuration by m1, m2, skip, p1, p2, s. The set of constraints is as follows: <sup>s</sup> <sup>=</sup> Ω1, C( <sup>k</sup>), Ω2 <sup>=</sup> {T<sup>1</sup> > S<sup>1</sup> <sup>1</sup> , T<sup>1</sup> > S2 <sup>1</sup> , T<sup>1</sup> > S<sup>3</sup> <sup>1</sup> , T<sup>1</sup> > S<sup>4</sup> <sup>1</sup> , T<sup>1</sup> <sup>≤</sup> <sup>S</sup><sup>5</sup> <sup>1</sup> }, {T<sup>1</sup> <sup>+</sup> <sup>k</sup><sup>0</sup> <sup>=</sup> <sup>T</sup>2, S<sup>1</sup> <sup>1</sup> + k<sup>1</sup> = S<sup>1</sup> <sup>2</sup> , S<sup>2</sup> <sup>1</sup> + k<sup>2</sup> = S2 <sup>2</sup> , S<sup>3</sup> <sup>1</sup> + k<sup>3</sup> = S<sup>3</sup> <sup>2</sup> , S<sup>4</sup> <sup>1</sup> + k<sup>4</sup> = S<sup>4</sup> <sup>2</sup> , S<sup>5</sup> <sup>1</sup> + k<sup>5</sup> = S<sup>5</sup> <sup>2</sup> , E<sup>6</sup> = k<sup>0</sup> - <sup>2</sup> <sup>+</sup> - 4 4 i=1 k<sup>i</sup> ... }, {T<sup>2</sup> > S1 <sup>2</sup> , T<sup>2</sup> > S<sup>2</sup> <sup>2</sup> , T<sup>2</sup> > S<sup>3</sup> <sup>2</sup> , T<sup>2</sup> > S<sup>4</sup> <sup>2</sup> , T<sup>2</sup> <sup>≤</sup> <sup>S</sup><sup>5</sup> <sup>2</sup> } with m1(c) = m2(c) = E6, m1(o) = [S<sup>1</sup> <sup>1</sup> ,...,S<sup>5</sup> <sup>1</sup> ], m2(o)=[S<sup>1</sup> <sup>2</sup> ,...,S<sup>5</sup> <sup>2</sup> ], m1(t) = T1, m2(t) = T2.

We can see that strategy B applies, because we have <sup>|</sup><sup>=</sup> <sup>∀</sup> k.[(Ω<sup>1</sup> <sup>∧</sup> <sup>C</sup>( k) =⇒ <sup>Ω</sup>2) <sup>∧</sup> <sup>c</sup> <sup>≤</sup> )m1|m2] =<sup>⇒</sup> <sup>o</sup><sup>1</sup> <sup>=</sup> <sup>o</sup><sup>2</sup>m1|m2. Computing the probability associated with these two traces we can verify that we have a counterexample. This pair of traces is, in fact, the same that has been found in [16] for a slightly more general version of Algorithm (2). Strategy B selects this relational trace since in order to make sure that the traces follow the same branches, the coupling rules enforce necessarily that the two samples released are different, preventing CRSE to prove equality of the output variables in the two runs.

**Unsafe sparse vector implementation: Algorithm 3.** Also this algorithm satisfies Assumption 1. The algorithm is -differentially private for one iteration. This is because, intuitively, adding noise to the threshold protects the result of the query as well at the branching instruction, but only for one iteration. The algorithm is not -differentially private, for any finite already at the second iteration, and a witness for this can be found using CRSE. We can see this using strategy B. Thanks to this strategy we will isolate a relational orthogonal trace, similarly to what has been found in [16] for the same algorithm. CRSE will unfold the loop twice, and it will scan all relational traces to see if there is an orthogonal trace. In particular, the relational trace that corresponds to the output o<sup>1</sup> = <sup>o</sup><sup>2</sup> = [⊥, ], that is the the trace with set of constraints Ω1, C( k), Ω2 = {T<sup>1</sup> > q1d1, T<sup>1</sup> ≤ q2d1}, {|q1d<sup>1</sup> − q1d2| ≤ 1, |q2d<sup>1</sup> − q2d2| ≤ 1}{T<sup>2</sup> > q1d2, T<sup>2</sup> ≤ q2d2}. Since the vector k is empty we can omit it and just write C. It is easy to see now that the following sigma: σ ≡ [q1d<sup>1</sup> → 0, q2d<sup>1</sup> → 1, q1d<sup>2</sup> → 1, q2d<sup>2</sup> → 0], proves that this relational trace is orthogonal: that is σ |= Ω<sup>1</sup> ∧ C, but σ |= Ω2.

Indeed if we consider two inputs D1, D<sup>2</sup> and two queries q1, q<sup>2</sup> such that: q1(D1) = q2(D2)=0, q2(D1) = q1(D2) = 1 we get that the probability of outputting the value o = [⊥, ] is positive in the first run, but it is 0 on the second. Hence, the algorithm can only be proven to be ∞-differentially private.

**A safe sparse vector implementation.** Algorithm 2 can be proven -d.p if we replace o[i]← to line 7. Let us consider a proof of this statement for n = 5. CRSE will try to prove the following postconditions: o<sup>1</sup> = [, ⊥,..., ⊥] =⇒ o<sup>2</sup> = [, ⊥,..., ⊥] ∧ <sup>c</sup> ≤ , ... , o<sup>1</sup> = [⊥,..., ⊥, ] =⇒ o<sup>2</sup> = [⊥,..., ⊥, ] ∧ <sup>c</sup> ≤ . The only interesting iteration will be the i-th one, in all the others the postcondition will be vacuously true. Also, the budget spent will be k<sup>0</sup> - <sup>2</sup> , the one spent for the threshold. For all the other sampling instruction we can spend 0 by just setting k<sup>j</sup> = q[j](D2) − q[j](D1) for j = i, that is by coupling ˆs<sup>1</sup> + k<sup>j</sup> = ˆs2, with k<sup>j</sup> = q[j](D2) − q[j](D1), spending |k<sup>j</sup> + q[j](D2) − q[j](D1)| = 0. So, at the i-th iteration the samples are coupled ˆs<sup>1</sup> +k<sup>i</sup> = ˆs2, with k<sup>i</sup> = 1. So if ˆs<sup>1</sup> ≥ t ˆ1 then also ˆs<sup>2</sup> ≥ t ˆ2, and also, if ˆs<sup>1</sup> < t ˆ<sup>1</sup> then also ˆs<sup>2</sup> < t ˆ2. This implies that at th i-th iteration we enter on the right run the true branch iff we enter the true branch on the left one. This by spending <sup>|</sup>k<sup>i</sup> <sup>+</sup> <sup>q</sup>[i](D2) <sup>−</sup> <sup>q</sup>[i](D1)<sup>|</sup> - <sup>4</sup> <sup>≤</sup> <sup>2</sup> - <sup>4</sup> . The total privacy budget spent will then be equal to .

# **9 Related Works**

There is now a wide array of formal techniques for reasoning about differential privacy, e.g. [1–6, 12, 15, 18, 20–23, 23, 24, 26, 27]. We will discuss here the techniques that are closest to our work. In [1] the authors devised a synthesis framework to automatically discover proofs of privacy using coupling rules similar to ours. However, their approach is not based on relational symbolic execution but on synthesis technique. Moreover, their framework cannot be directly used to find violations of differential privacy. In [2] the authors devise a decision logic for differential privacy which can soundly prove or disprove differential privacy. The programs considered there do not allow assignments to real and integer variables inside the body of while loops. While their technique is different from our, their logic could be potentially integrated in our framework as a decision procedure. In the recent concurrent work [23], the authors propose an automated technique for proving or finding violations to differential privacy based on program analysis, standard symbolic execution and on the notion of randomness alignment, which in their approach plays the role that approximate coupling plays for us here. Their approach focuses on efficiency and scalability, while we focus here more on the fundational aspects of our technique.

Another recent concurrent work [27] combines testing based on (unary) symbolic execution with approximate coupling for proving and finding violations to differential privacy. Their symbolic execution engine is similar to our SPFOR, and is used to reduce the numbers of tests that need to be generated, and for building privacy proofs from concrete executions. Their approach relies more directly on testing, providing an approximate notion of privacy. As discussed in their paper this could be potentially mitigated by using a relational symbolic execution engine as the one we propose here, at the cost of using more complex constraints. Another related work is [15], proposing model checking for finding counterexamples to differential privacy. The main difference with our work is in the basic technique and in the fact that model checking reason about a model of the code, rather than the code itself. They also consider the above threshold example and they are able to handle only a finite number of iterations.

Other work has studied how to find violations to differential privacy through testing [5, 6]. The approaches proposed in [5, 6] differ from ours in two ways: first, they use a statistical approach; second, they look at concrete values of the data and the privacy parameters. By using symbolic execution we are able to reason about symbolic values, and so consider -differential privacy for any finite . Moreover, our technique does not need sampling - although we still need to compute distributions to confirm a violation. Our work can be seen as a probabilistic extension of the framework presented in [10], where sampling instructions in the relational symbolic semantics are handled through rules inspired by the logic apRHL<sup>+</sup> [3]. This logic can be used to prove differential privacy but does not directly help in finding counterexamples when the program is not private.

# **10 Conclusion**

We presented CRSE: a symbolic execution engine framework integrating relational reasoning and probabilistic couplings. The framework allows both proving and refuting differential privacy. When proving CRSE can be seen as strong postcondition calculus. When refuting CRSE uses several strategies to isolate potentially dangerous traces. Future work includes interfacing more efficiently CRSE with numeric solvers to find maximums of ratios of probabilities of traces.

Acknowledgements We warmly thank the reviewers for helping us improving the paper. This work was supported by the National Science Foundation under Grant No. 1565365, 1565387 and 2040215.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Graded Hoare Logic and its Categorical Semantics**

Marco Gaboardi1(-) , Shin-ya Katsumata<sup>2</sup> , Dominic Orchard<sup>3</sup> , and Tetsuya Sato<sup>4</sup>

<sup>1</sup> Boston University, Boston, USA gaboardi@bu.edu

<sup>2</sup> National Institute of Informatics, Tokyo, Japan s-katsumata@nii.ac.jp

<sup>3</sup> University of Kent, Canterbury, United Kingdom d.a.orchard@kent.ac.uk

<sup>4</sup> Tokyo Institute of Technology, Tokyo, Japan tsato@c.titech.ac.jp

**Abstract.** Deductive verification techniques based on program logics (i.e., the family of Floyd-Hoare logics) are a powerful approach for program reasoning. Recently, there has been a trend of increasing the expressive power of such logics by augmenting their rules with additional information to reason about program side-effects. For example, general program logics have been augmented with cost analyses, logics for probabilistic computations have been augmented with estimate measures, and logics for differential privacy with indistinguishability bounds. In this work, we unify these various approaches via the paradigm of grading, adapted from the world of functional calculi and semantics. We propose Graded Hoare Logic (GHL), a parameterisable framework for augmenting program logics with a preordered monoidal analysis. We develop a semantic framework for modelling GHL such that grading, logical assertions (pre- and post-conditions) and the underlying effectful semantics of an imperative language can be integrated together. Central to our framework is the notion of a graded category which we extend here, introducing graded Freyd categories which provide a semantics that can interpret many examples of augmented program logics from the literature. We leverage coherent fibrations to model the base assertion language, and thus the overall setting is also fibrational.

# **1 Introduction**

The paradigm of grading is an emerging approach for augmenting language semantics and type systems with fine-grained information [40]. For example, a graded monad provides a mechanism for embedding side-effects into a pure language, exactly as in the approach of monads, but where the types are augmented ("graded") with information about what effects may occur, akin to a type-and-effect system [24,42]. As another example, graded comonadic type operators in linear type systems can capture non-linear dataflow and properties of data use [7,16,44]. In general, graded types augment a type system with some algebraic structure which serves to give a parameterisable fine-grained program analysis capturing the underlying structure of a type theory or semantics.

c The Author(s) 2021

N. Yoshida (Ed.): ESOP 2021, LNCS 12648, pp. 234–263, 2021. https://doi.org/10.1007/978-3-030-72019-3 9

Much of the work in graded types has arisen in conjunction with categorical semantics, in which graded modal type operators are modelled via graded monads [13,17,25,36,33], graded comonads (often with additional graded monoidal structure) [7,16,25,43,44], graded 'joinads' [36], graded distributive laws between graded (co)monads [15], and graded Lawvere theories [27].

So far grading has mainly been employed to reason about functional languages and calculi, thus the structure of the λ-calculus has dictated the structure of categorical models (although some recent work connects graded monads with classical dataflow analyses on CFGs [21]). We investigate here the paradigm of grading instead applied to imperative languages. As it happens, there is already a healthy thread of work in the literature augmenting program logics (in the family of Floyd-Hoare logics) with analyses that resemble notions of grading seen more recently in the functional world. The general approach is to extend the power of deductive verification by augmenting program logic rules with an analysis of side effects, tracked by composing rules. For example, work in the late 1980s and early 1990s augmented program logics with an analysis of computation time, accumulating a cost measure [37,38], with more recent finegrained resource analysis based on multivariate analysis associated to program variables [8]. As another example, the Union Bound Logic of Barthe et al. [5] defines a Hoare-logic-style system for reasoning about probabilistic computations with judgments <sup>β</sup> c : φ ⇒ ψ for a program c annotated by the maximum probability β (the union bound) that ψ does not hold. The inference rules of Union Bound Logic track and compute the union bound alongside the standard rules of Floyd-Hoare logic. As a last example, Approximate Relational Hoare Logic [2,6,39,48] augments a program logic with measures of the -δ bounds for reasoning about differential privacy.

In this work, we show how these disparate approaches can be unified by adapting the notion of grading to an imperative program-logic setting, for which we propose Graded Hoare Logic (GHL): a parameterisable program logic and reasoning framework graded by a preordered monoidal analysis. Our core contribution is GHL's underlying semantic framework which integrates grading, logical assertions (pre- and post-conditions) and the effectful semantics of an imperative language. This framework allows us to model, in a uniform way, the different augmented program logics discussed above.

Graded models of functional calculi tend to adopt either a graded monadic or graded comonadic model, depending on the direction of information flow in the analysis. We use the opportunity of an imperative setting (where the λ-calculus' asymmetrical 'many-inputs-to-one-output' model is avoided) to consider a more flexible semantic basis of graded categories. Graded categories generalise graded (co)monadic approaches, providing a notion of graded denotation without imposing on the placement (or 'polarity') of grading.

Outline Section 2 begins with an overview of the approach, focusing on the example of Union Bound Logic and highlighting the main components of our semantic framework. The next three sections then provide the central contributions:


An extended version of this paper provides appendices which include further examples and proof details [14].

# **2 Overview of GHL and Prospectus of its Model**

As discussed in the introduction, several works explore Hoare logics combined with some form of implicit or explicit grading for program analysis. Our aim is to study these in a uniform way. We informally introduce of our approach here.

We start with an example which can be derived in Union Bound Logic [5]:

$$\vdash\_{0.05} \{\top\} \; \mathsf{do} \; v\_1 \leftarrow \mathsf{Guess}(0,1); \mathsf{do} \; v\_2 \leftarrow \mathsf{Guess}(0,1); \\ v := \mathsf{max}(v\_1, v\_2) \; \{v \le 2\}$$

This judgment has several important components. First, we have primitives for procedures with side-effects such as do v<sup>1</sup> ← Gauss(0, 1). This procedure samples a random value from the standard normal distribution with mean 0 and variance 1 and stores the result in the variable v1. This kind of procedure with side effects differs from a regular assignment such as v :=max(v1, v2), which is instead considered to be pure (wrt. probabilities) in our approach.

The judgment has grade '0.05' which expresses a bound on the probability that the postcondition is false, under the assumption of the precondition, after executing the program; we can think of it as the probability of failing to guarantee the postcondition. In our example (call it program P), since the precondition is true, this can be expressed as: Pr<sup>P</sup> (m)[v > 2] <sup>≤</sup> <sup>0</sup>.05 where -<sup>P</sup>(m) is the probability distribution generated in executing the program. The grade of P in this logic is derived using three components. First, sequential composition:

$$\frac{\vdash\_{\beta} \left\{ \psi \right\} P\_1 \left\{ \psi\_1 \right\} \quad \vdash\_{\beta'} \left\{ \psi\_1 \right\} P\_2 \left\{ \phi \right\}}{\vdash\_{\beta+\beta'} \left\{ \psi \right\} P\_1; P\_2 \left\{ \phi \right\}}$$

which sums the failure probabilities. Second, an axiom for Gaussian distribution:

<sup>0</sup>.<sup>025</sup> {} do v ← Gauss(0, 1) {v ≤ 2}

with a basic constant 0.025 which comes from the property of the Gaussian distribution we are considering. Third, by the following judgment which is derivable by the assignment and the consequence rules, which are the ones from Hoare Logic with a trivial grading 0 which is the unit of addition:

$$\vdash\_0 \left\{ v\_1 \le 2 \lor v\_2 \le 2 \right\} \\
v := \max(v\_1, v\_2) \left\{ v \le 2 \right\}$$

Judgments for more complex examples can be derived using the rules for conditional and loops. These rules also consider grading, and the grading can depend on properties of the program. For example the rule for conditionals is:

$$\frac{\vdash\_{\beta} \left\{ \psi \land e\_{b} = \mathtt{tt} \right\} P\_{1} \left\{ \phi \right\} \quad \vdash\_{\beta} \left\{ \psi \land e\_{b} = \mathtt{ff} \right\} P\_{2} \left\{ \phi \right\}}{\vdash\_{\beta} \left\{ \psi \right\} \text{ if } e\_{b} \text{ then } P\_{1} \text{ else } P\_{2} \left\{ \phi \right\}} $$

This allows one to reason also about the grading in a conditional way, through the two assumptions ψ ∧ e<sup>b</sup> = tt and ψ ∧ e<sup>b</sup> = ff. We give more examples later.

Other logics share a similar structure as that described above for the Union Bound logic, for example the relational logic apRHL [2], and its variants [48,49], for reasoning about differential privacy. Others again use a similar structure implicitly, for example the Hoare Logic to reason about asymptotic execution cost by Nielson [37], Quantitative Hoare Logic [8], or the relational logic for reasoning about program counter security presented by Barthe [3].

To study the semantics of these logics in a uniform way, we first abstract the logic itself. We design a program logic, which we call Graded Hoare Logic (GHL), containing all the components discussed above. In particular, the language is a standard imperative language with conditional and loops. Since our main focus is studying the semantics of grading, for simplicity we avoid using a 'while' loop, using instead a bounded 'loop' operation (loop e do P). This allow us to focus on the grading structures for total functions, leaving the study of the interaction between grading and partiality to future work. The language is parametric in the operations that are supported in expressions—common in several treatments of Hoare Logic—and in a set of procedures and commands with side effects, which are the main focus of our work. GHL is built over this language and an assertion logic which is parametric in the basic predicates that can be used to reason about programs. GHL is also parametric in a preordered monoid of grades, and in the axioms associated with basic procedures and commands with side effects. This generality is needed in order to capture the different logics we mentioned before.

GHL gives us a unified syntax, but our real focus is the semantics. To be as general as possible we turn to the language of category theory. We give a categorical framework which can capture different computational models and side effects, with denotations that are refined by predicates and grades describing program behaviours. Our framework relates different categories (modelling different aspects of GHL) as summarized by the following informal diagram (1).

$$\begin{array}{c} \mathbb{P} \xrightarrow{I} \begin{array}{c} I \\ \hline \\ \mathbb{V} \xrightarrow{I} \end{array} \end{array} \xrightarrow{\mathbb{E}} \begin{array}{c} \begin{array}{c} (1) \\ \hline \\ \end{array} \end{array} \tag{1}$$

This diagram should not be understood as a commutative diagram in **CAT** as E is a graded category and hence not an object of **CAT**.

The category V models values and pure computations, the category C models impure computations, P is a category of predicates, and E is a graded category whose hom-sets are indexed by grades—elements of a preordered monoid. The presentation of graded categories is new here, but has some relation to other structures of the same name (discussed in Section 4).

This diagram echos the principle of refinement as functors proposed by Melli`es and Zeilberger [32]. The lower part of the diagram offers an interpretation of the language, while the upper part offers a logical refinement of programs with grading. However, our focus is to introduce a new graded refinement view. The ideas we use to achieve this are to interpret the base imperative language using a Freyd category <sup>I</sup> : <sup>V</sup> <sup>→</sup> <sup>C</sup> (traditionally used to model effects) with countable coproducts, to interpret the assertion logic with a coherent fibration <sup>p</sup> : <sup>P</sup> <sup>→</sup> <sup>V</sup>, and to interpret GHL as a graded Freyd category ˙ <sup>I</sup> : <sup>P</sup> <sup>→</sup> <sup>E</sup> with homogeneous coproducts. In addition, the graded category E has a functor<sup>5</sup> q into C which erases assertions and grades and extracts the denotation of effectful programs, in the spirit of refinements. The benefit of using a Freyd category as a building block is that they are more flexible than other structures (e.g., monads) for constructing models of computational effects [47,51]. For instance, in the category **Meas** of measurable spaces and measurable functions, we cannot define state monads since there are no exponential objects. However, we can still have a model of first-order effectful computations using Freyd categories [46].

Graded Freyd categories are a new categorical structure that we designed for interpreting GHL judgments (Section 4.2). The major difference from an ordinary Freyd category is that the 'target' category is now a graded category (E in the diagram (1)). The additional structure provides what we need in order to interpret judgments including grading.

To show the generality of this structure, we present several approaches to instantiating the categorical framework of GHL's semantics, showing constructions via graded monads and graded comonads preserving coproducts.

Part of the challenge in designing a categorical semantics for GHL is to carve out and implement the implicit assumptions and structures used in the semantics of the various Hoare logics. A representative example of this challenge is the interpretation of the rule for conditionals in Union Bound Logic that we introduced above. We interpret the assertion logic in (a variant of) coherent fibrations <sup>p</sup> : <sup>P</sup> <sup>→</sup> <sup>V</sup>, which model the ∧∨∃=-fragment of first-order predicate logic [22]. In this abstract setup, the rule for conditionals may become unsound as it is built on the implicit assumption that the type Bool, which is interpreted as 1 + 1, consists only of two elements, but this may fail in general V. For example, a suitable coherent fibration for relational Hoare logic would take **Set**<sup>2</sup> as the base category, but we have **Set**<sup>2</sup>(1, 1+ 1) ∼= 4, meaning that there are four global elements in the interpretation of Bool. We resolve this problem by introducing

<sup>5</sup> More precisely, this is not quite a functor because E is a graded category; see Definition 9 for the precise meaning.

a side condition to guarantee the decidability of the boolean expression:

$$\begin{array}{c} \vdash\_m \left\{ \psi \land e\_b = \mathtt{tt} \right\} P\_1 \left\{ \phi \right\} \quad \vdash\_m \left\{ \psi \land e\_b = \mathtt{f} \mathbf{f} \right\} P\_2 \left\{ \phi \right\} \quad \psi \vdash e\_b = \mathtt{tt} \lor e\_b = \mathtt{f} \mathbf{f} \\\hline \vdash\_m \left\{ \psi \right\} \text{ if } e\_b \text{ then } P\_1 \text{ \*\*1\*\*se } P\_2 \left\{ \phi \right\} \end{array}$$

This is related to the synchronization condition appearing in the relational Hoare logic rule for conditional commands (e.g., [6]).

Another challenge in the design of the GHL is how to assign a grade to the loop command loop e do P. We may na¨ıvely give it the grade m<sup>l</sup> <sup>i</sup>∈<sup>N</sup> <sup>m</sup><sup>i</sup> , where m is the grade of P, because P is repeatedly executed some finite number of times. However, the grade m<sup>l</sup> is a very loose over-approximation of the grade of loop e do P. Even if we obtain some knowledge about the iteration count e in the assertion logic, this cannot be reflected in the grade. To overcome this problem, we introduce a Hoare logic rule that can estimate a more precise grade of loop e do P, provided that the value of e is determined:

$$\frac{\forall 0 \le z < N. \ \vdash\_m \left\{ \psi\_{z+1} \right\} P \left\{ \psi\_z \right\} \quad \psi\_N \vdash e\_n = \lceil N \rceil}{\vdash\_{m^N} \left\{ \psi\_N \right\} \mathbf{1oop} \, e\_n \; \mathbf{do} \; P \left\{ \psi\_0 \right\}} $$

This rule brings together the assertion language and grading, creating a dependency from the former to the latter, and giving us the structure needed for a categorical model. The right premise is a judgment of the assertion logic (under program variables Γ<sup>M</sup> and pre-condition ψ<sup>N</sup> ) requiring that e is statically determinable as N. This premise makes the rule difficult to use in practical applications where e is dynamic. We expect a more "dependent" version of this rule is possible with a more complex semantics internalizing some form of datadependency. Nevertheless, the above is enough to study the semantics of grading and its interaction with the Hoare Logic structure, which is our main goal here.

# **3 Loop Language and Graded Hoare Logic**

After introducing some notation and basic concepts used throughout, we outline a core imperative loop language, parametric in its set of basic commands and procedures (Section 3.2). We then define a template of an assertion logic (Section 3.3), which is the basis of Graded Hoare Logic (Section 3.4).

### **3.1 Preliminaries**

Throughout, we fix an infinite set **Var** of variables which are employed in the loop language (as the names of mutable program variables) and in logic (to reason about these program variables).

A many-sorted signature Σ is a tuple (S, O, ar) where S, O are sets of sorts and operators, and ar : <sup>O</sup> <sup>→</sup> <sup>S</sup><sup>+</sup> assigns argument sorts and a return value sort to operators (where S<sup>+</sup> is a non-empty sequence of sorts, i.e., an operator o with signature (s<sup>1</sup> <sup>×</sup>...×sn) <sup>→</sup> <sup>s</sup> is summarized as ar (o) = s1,...,sn, s ∈ <sup>S</sup><sup>+</sup>). We say that another many-sorted signature Σ = (S , O , ar ) is an extension of Σ if S ⊆ S and O ⊆ O and ar(o) = ar (o) for all o ∈ O.

Let Σ = (S, ···) be a many-sorted signature. A context for Σ is a (possibly empty) sequence of pairs Γ ∈ (**Var**×S)<sup>∗</sup> such that all variables in Γ are distinct. We regard Γ as a partial mapping from **Var** to S. The set of contexts for Σ is denoted **Ctx**Σ. For s ∈ S and Γ ∈ **Ctx**Σ, we denote by **Exp**Σ(Γ, s) the set of Σ-expressions of sort s under the context Γ. When Σ,Γ are obvious, we simply write e : s to mean e ∈ **Exp**Σ(Γ, s). This set is inductively defined as usual.

An interpretation of a many-sorted signature Σ = (S, O, ar) in a cartesian category (V, <sup>1</sup>, <sup>×</sup>) consists of an assignment of an object [[s]] <sup>∈</sup> <sup>V</sup> for each sort <sup>s</sup> <sup>∈</sup> <sup>S</sup> and an assignment of a morphism [[o]] <sup>∈</sup> <sup>V</sup>([[s1]] × ··· × [[sn]], [[s]]) for each o ∈ O such that ar(o) = s1,...,sn, s. Once such an interpretation is given, we extend it to Σ-expressions in the standard way (see, e.g. [9,45]). First, for a context Γ = x<sup>1</sup> : s1, ··· , x<sup>n</sup> : s<sup>n</sup> ∈ **Ctx**Σ, by [[Γ]] we mean the product [[s1]]×···×[[sn]]. Then we inductively define the interpretation of e ∈ **Exp**Σ(Γ, s) as a morphism [[e]] <sup>∈</sup> <sup>V</sup>([[Γ]], [[s]]).

Throughout, we write bullet-pointed lists marked with for the mathematical data that are parameters to Graded Hoare Logic (introduced in Section 3.4).

### **3.2 The Loop Language**

We introduce an imperative language called the loop language, with a finite looping construct. The language is parameterised by the following data:

 a many-sorted signature Σ = (S, O, ar) extending a base signature (S0, O0, ar0) of sort S<sup>0</sup> = {bool, nat} with essential constants as base operators O0, shown here with their signatures for brevity rather than defining ar <sup>0</sup> directly:

$$O\_0 = \{ \mathtt{tt} : \mathtt{bool}, \mathtt{ff} : \mathtt{bool} \} \cup \{ [k] : \mathtt{nat} \mid k \in \mathbb{N} \}$$

where bool is used for branching control-flow and nat is used for controlling loops, whose syntactic constructs are given below. We write \*k+ to mean the embedding of semantic natural numbers into the syntax.

 a set **CExp** of command names (ranged over by c) and a set **PExp**<sup>s</sup> of procedure names of sort s (ranged over by p) for each sort s ∈ S.

When giving a program, we first fix a context Γ<sup>M</sup> for the program variables. We define the set of programs (under a context ΓM) by the following grammar:

$$P ::= P : P \mid \mathtt{skip} \mid v := e \mid \mathtt{do} \, c \mid \mathtt{do} \, v \gets p \mid \mathtt{if} \, e\_b \, \mathtt{then} \, P \, \mathtt{else} \, P \mid \mathtt{loop} \, e\_n \, \mathtt{do} \, P$$

where v ∈ ΓM, eb, e<sup>n</sup> are well-typed Σ-expressions of sort bool and nat under ΓM, and c ∈ **CExp**. In assignment commands, e ∈ **Exp**Σ(ΓM, Γ(v)). In procedure call commands, p ∈ **PExp**<sup>Γ</sup>(v). Each program must be well-typed under ΓM. The typing rules are routine so we omit them.

Thus, programs can be sequentially composed via ; with skip as the trivial program which acts as a unit to sequencing. An assignment v := e assigns expressions to a program variable v. Commands can be executed through the instruction do c which yields some side effects but does not return any value. Procedures can be executed through a similar instruction do v ← p which yields some side effect but also returns a value which is used to update v. Finally, conditionals are guarded by a boolean expression e<sup>b</sup> and the iterations of a looping construct are given by a natural number expression e<sup>n</sup> (which is evaluated once at the beginning of the loop to determine the number of iterations).

This language is rather standard, except for the treatment of commands and procedures of which we give some examples here.

Example 1. Cost Information: a simple example of a command is tick, which yields as a side effect the recording of one 'step' of computation.

Control-Flow Information: two other simple example of commands are cfTT and cfFF, which yield as side effects the recording of either true or false to a log. A program can be augmented with these commands in its branches to give an account of a program's control flow. We will use these commands to reason about control-flow security in Example 3.

Probability Distributions: a simple example of a procedure is Gauss(x, y), which yields as a side effect the introduction of new randomness in the program, and which returns a random sample from the Gaussian distribution with mean and variance specified by x, y ∈ ΓM. We will see how to use this procedure to reason about probability of failure in Example 4.

Concrete instances of the loop language typically include conversion functions between the sorts in Σ, e.g., so that programs can dynamically change control flow depending on values of program variables. In other instances, we may have a language manipulating richer data types, e.g., reals or lists, and also procedures capturing higher-complexity computations, such as Ackermann functions.

### **3.3 Assertion Logic**

We use an assertion logic to reason about properties of basic expressions. We regard this reasoning as a meta-level activity, thus the logic can have more sorts and operators than the loop language. Thus, over the data specifying the loop language, we build formulas of the assertion logic by the following data:


The assertion logic is a fragment of the many-sorted first-order logic over Σlterms admitting: 1) finite conjunctions, 2) countable disjunctions, 3) existential quantification, and 4) equality predicates. Judgements in the assertion logic have the form Γ | ψ1, ··· , ψ<sup>n</sup> φ (read as ψ<sup>1</sup> ∧···∧ ψ<sup>n</sup> implies φ), where Γ ∈ **Ctx**<sup>Σ</sup><sup>l</sup> is a context giving types to variables in the formulas ψ1, ··· , ψn, φ ∈ **Fml**<sup>Σ</sup><sup>l</sup> (Γ). The logic has the axiom rule deriving Γ | ψ φ for each pair (ψ, φ) of formulas in **Axiom**(Γ). The rest of inference rules of this logic are fairly standard and so we omit them (see e.g. [22, Section 3.2 and Section 4.1]).

The set **Fml**<sup>Σ</sup><sup>l</sup> (Γ) of formulas under Γ ∈ **Ctx**<sup>Σ</sup><sup>l</sup> is inductively defined as follows:


#### **Fig. 1.** Formula formation rules

In some of our examples we will use the assertion logic to reason about programs in a relational way, i.e., to reason about two executions of a program (we call them left and right executions). This requires basic predicates to manage expressions representing pairs of values in our assertion logic. As an example, we could have two predicates eqv1, eqv2, that can assert the equality of the left and right executions of an expression to some value, respectively. That is, the formula eqv1(eb,true), which we will write using infix notation <sup>e</sup>b1 <sup>=</sup> true, asserts that the left execution of the boolean expression e<sup>b</sup> is equal to true.

### **3.4 Graded Hoare Logic**

We now introduce Graded Hoare Logic (GHL), specified by the following data:


$$\begin{aligned} &C\_{\mathsf{c}} : \mathbf{Fml}\_{\Sigma\_{l}}(I\_{\mathsf{M}}) \times M \to 2^{\mathbf{C}\mathbf{Exp}}\\ &C\_{\mathsf{p}}^{s} : \mathbf{Fml}\_{\Sigma\_{l}}(I\_{\mathsf{M}}) \times M \times \mathbf{Fml}\_{\Sigma\_{l}}(r:s) \to 2^{\mathbf{P}\mathbf{Exp}\_{s}} \quad (s \in S \land r \notin \mathsf{dom}(I\_{\mathsf{M}})) \end{aligned}$$

The function C<sup>c</sup> takes a pre-condition and a grade, returning a set of command symbols satisfying these specifications. A command c may appear in Cc(φ, m) for different pairs (φ, m), enabling pre-condition-dependent grades to be assigned to c. Similarly, the function C<sup>s</sup> <sup>p</sup> takes a pre-condition, a grade, and a postcondition for return values, and returns a set of procedure names of sort s satisfying these specifications. Note, r is a distinguished variable (for return values) not in ΓM. The shape of C<sup>c</sup> and C<sup>c</sup> as predicates over commands and procedures, indexed by assertions and grades, provides a way to link grades and assertions for the effectful operations of GHL. Section 3.5 gives examples exploiting this.

From this structure we define a graded Hoare logic by judgments of the form: <sup>m</sup> {φ} P {ψ} denoting a program P with pre-condition φ ∈ **Fml**<sup>Σ</sup><sup>l</sup> (ΓM), postcondition ψ ∈ **Fml**<sup>Σ</sup><sup>l</sup> (ΓM) and analysis m ∈ M. Graded judgments are defined inductively via the inference rules given in Table 1. Ignoring grading, many of the rules are fairly standard for a Floyd-Hoare program logic. The rule for skip is standard but includes grading by the unit 1 of the monoid. Similarly, assignment

<sup>1</sup> {ψ} skip {ψ} <sup>m</sup> {ψ} P<sup>1</sup> {ψ1} <sup>m</sup>- {ψ1} P<sup>2</sup> {φ} <sup>m</sup>·m- {ψ} P1; P<sup>2</sup> {φ} <sup>1</sup> {ψ[e/v]} v := e {ψ} f ∈ Cc(ψ,m) <sup>m</sup> {ψ} do c {ψ} <sup>p</sup> <sup>∈</sup> <sup>C</sup><sup>Γ</sup>M(v) <sup>p</sup> (ψ, m, φ) <sup>m</sup> {ψ} do v ← p {(∃v : ΓM(v) . ψ) ∧ φ[v/r]} Γ<sup>M</sup> | ψ- ψ m ≤ m- Γ<sup>M</sup> | φ φ <sup>m</sup> {ψ} P {φ} <sup>m</sup>- {ψ-} P {φ-} ∀0 ≤ z < N. <sup>m</sup> {ψ<sup>z</sup>+1} P {ψz} Γ<sup>M</sup> | ψ<sup>N</sup> e<sup>n</sup> = #N\$ m<sup>N</sup> {ψ<sup>N</sup> } loop e<sup>n</sup> do P {ψ0} <sup>m</sup> {ψ ∧ e<sup>b</sup> = tt} P<sup>1</sup> {φ} <sup>m</sup> {ψ ∧ e<sup>b</sup> = ff} P<sup>2</sup> {φ} Γ<sup>M</sup> | ψ e<sup>b</sup> = tt ∨ e<sup>b</sup> = ff <sup>m</sup> {ψ} if e<sup>b</sup> then P<sup>1</sup> else P<sup>2</sup> {φ}

**Table 1.** Graded Hoare Logic Inference Rules

is standard, but graded with 1 since we do not treat it specially in GHL. Sequential composition takes the monoid multiplication of the grades of the subterms. The rules for commands and procedures use the functions C<sup>c</sup> and C<sup>p</sup> introduced above. Notice that the rule for commands uses as the pre-condition as its postcondition, since commands have only side effects and they do not return any value. The rule for procedures combines the pre- and post-conditions given by C<sup>p</sup> following the style of Floyd's assignment rule [12].

The non-syntax-directed consequence rule is similar to the usual consequence rule, and in addition allows the assumption on the grade to be weakened (approximated) according to the ordering of the monoid.

The shape of the loop rule is slightly different from the usual one. It uses the assertion-logic judgment Γ<sup>M</sup> | ψ<sup>N</sup> e<sup>n</sup> = \*N+ to express the assumption that e<sup>n</sup> evaluates to \*N+. Under this assumption it uses a family of assertions ψ<sup>z</sup> indexed by the natural numbers z ∈ {0, 1,...,N − 1} to conclude the post-condition ψ0. This family of assertions plays the role of the classical invariant in the Floyd-Hoare logic rule for 'while'. Assuming that the grade of the loop body is m, the grade of the loop command is then <sup>m</sup><sup>N</sup> , where <sup>m</sup><sup>0</sup> = 1 and <sup>m</sup><sup>k</sup>+1 <sup>=</sup> <sup>m</sup> · <sup>m</sup><sup>k</sup>. By instantiating this rule with ψ<sup>z</sup> = (θ ∧ e<sup>n</sup> = \*z+), the loop rule also supports the following derived rule which is often preferable in examples:

$$\frac{\forall 0 \le z < N. \ \vdash\_m \left\{ \theta \land e\_n = \lceil z + 1 \rceil \right\} P \left\{ \theta \land e\_n = \lceil z \rceil \right\}}{\vdash\_{m^N} \left\{ \theta \land e\_n = \lceil N \rceil \right\} \mathbf{1oop} \, e\_n \, \mathbf{do} \, P \left\{ \theta \land e\_n = \lceil 0 \rceil \right\} }$$

The rule for the conditional is standard except for the condition Γ<sup>M</sup> | ψ e<sup>b</sup> = tt∨e<sup>b</sup> = ff. While this condition may seem obvious, it is actually important to make GHL sound in various semantics (mentioned in Section 2). As an example, suppose that a semantics [[−]] of expressions is given in the product category **Set**<sup>2</sup>, which corresponds to two semantics [[−]]1, [[−]]<sup>2</sup> of expressions in **Set**. Then the side condition for the conditional is to guarantee that for any boolean expression eb, and pair of memories (ρ1, ρ2) satisfying the precondition ψ, the pair ([[eb]]1(ρ1), [[eb]]2(ρ2)) is either [[tt]] = (tt,tt) or [[ff]] = (ff, ff). We note that other relational logics such as apRHL [6] employ an equivalent syntactic side condition in their rule for conditionals.

### **3.5 Example Instantiations of GHL**

Example 2 (Simple cost analysis). We can use the tick command discussed in Example 1 to instrument programs with cost annotations. We can then use GHL to perform cost analysis by instantiating GHL with the additive natural number monoid (N, <sup>≤</sup>, <sup>0</sup>, +) and tick <sup>∈</sup> <sup>C</sup>c(φ, 1). Thus, we can form judgments <sup>1</sup> {φ} do tick {φ} which account for cost via the judgment's grade. Sequential composition accumulates cost and terms like skip and assignment have 0 cost.

Let us use this example to illustrate how C<sup>c</sup> can assign multiple pre-conditiongrade pairs to a command. Suppose that we modify the semantics of tick so that it reports unit cost 1 when variable x is 0, otherwise cost 2. We can then define C<sup>c</sup> so that tick ∈ Cc(x = \*0+, 1) and also tick ∈ Cc(x = \*0+, 2). In this way, we can give different grades to programs depending on their pre-conditions.

Example 3 (Program Counter Security). We can use the commands cfTT and cfFF discussed in Example 1 to instrument programs with control flow annotations, recording to an external log. GHL can then be used to reason about program counter security [35][3, Section 7.2] of instrumented programs. This is a relational security property similar to non-interference (requiring that private values do not influence public outputs) but where only programs with the same control flow are considered.

Firstly, any conditional statement if e<sup>b</sup> thenP<sup>t</sup> elseP<sup>f</sup> in a program is elaborated to a statement if e<sup>b</sup> then (cfTT; Pt) else (cfFF; P<sup>f</sup> ). We then instantiate GHL with a monoid of words over {tt, ff} with prefix order: 2<sup>∗</sup> ({tt, ff}<sup>∗</sup>, <sup>≤</sup> , , ·) and we consider cfTT ∈ Cc(φ, tt) and cfTT ∈ Cc(φ, ff). We can thus form judgments of the shape tt {φ} do cfTT {φ} and ff {φ} do cfFF {φ} which account for control-flow information (forming paths) via the judgment's grade. Sequential composition concatenates control-flow paths and terms like skip and assignment do not provide any control-flow information, i.e. .

We then instantiate the assertion logic to support relational reasoning, i.e., where the expressions of the language are interpreted as pair of values. For an expression e, interpreted as a pair (v1, v2) then we write e1 = v<sup>1</sup> to say that the first component (left execution) equals v<sup>1</sup> and e2 = v<sup>2</sup> to say that the second component (right execution) equals v2. In the assertion logic, we can then describe public values which need to be equal, following the tradition in reasoning about non-interference, by the predicate e1 = e2. Private data are instead interpreted as a pair of arbitrary values. (Section 3.3 suggested the notation eqvi(e, b) for <sup>e</sup>i <sup>=</sup> <sup>b</sup>, but we use the latter for compactness here).

As an example, one can prove the following judgment where x is a public variable and y is a private one, and b ∈ {tt, ff}:

$$\vdash\_b \{x \langle 1 \rangle = x \langle 2 \rangle \land x \langle 1 \rangle = b\} \mathbin{\mathtt{if}} \ x \ \mathtt{then} \ (\mathtt{cfT}; x = 1; y = 1) \mathbin{\mathtt{else}} \ (\mathtt{cfF}; x = 2; y = 2) \{x \langle 1 \rangle = x \langle 2 \rangle\}$$

This judgment shows the program is non-interferent, since the value of x is independent from the value of the private variable y, and secure in the program counter model, since the control flow does not depend on the value of y. Conversely, the following judgment is not derivable for both b = tt and b = ff:

<sup>b</sup> {x1=x2∧y1=b}if y then (cfTT; x=1; y=1) else (cfFF; x=1; y=2){x1=x2}

This program is non-interferent but is not secure in the program counter model because the control flow leaks information about y which is a private variable.

Example 4 (Union Bound Logic). Section 1 discussed the Union Bound logic by Barthe et al. [5]. This logic embeds smoothly into GHL by using the pomonoid (R≥<sup>0</sup>, <sup>≤</sup>, <sup>0</sup>, +) and procedures of the form sampleμ,e as samplings from a probabilistic distribution μ parametrised over the syntax of GHL expressions e. Following Barthe et al. [5], we consider a semantically defined set for Cp:

$$C\_p(\phi, \beta, \psi) = \{ \mathtt{samp1e}\_{\mu, e} \mid \forall s. s \in [\phi] \implies \mathrm{Pr}\_{s' \leftarrow \boxplus \mathrm{samp1e}\_{\mu, e}}[(s)[s' \in \lceil \neg \psi \rceil] \le \beta) \}$$

This definition captures that, assuming the pre-condition holds for an input memory state s, then for output value s from sampling sampleμ,e, the probability that the post-condition is false is bounded above by β. This allow us to consider different properties of the distribution μ with parameter e.

# **4 Graded Categories**

Now that we have introduced GHL and key examples, we turn to the core of its categorical semantics: graded categories.

Graded monads provide a notion of sequential composition for morphisms of the form I → TmJ, i.e., with structure on the target/output capturing some information by the grade m drawn from a pomonoid [24]; dually, graded comonads provide composition for DmI → J, i.e. with structure on the source/input with grade m [43]. We avoid the choice of whether to associate grading with the input or output by instead introducing graded categories, which are agnostic about the polarity (or position) of any structure and grading. Throughout this section, we fix a pomonoid (M, ≤, 1, ·) (with · monotonic wrt. ≤).

**Definition 1.** An M-graded category C consists of the following data:


Graded categories satisfy the usual categorical laws of identity and associativity, and also the commutativity of upcast and composition: <sup>↑</sup><sup>n</sup>- <sup>n</sup> <sup>g</sup>◦↑<sup>m</sup>- <sup>m</sup> <sup>f</sup> <sup>=</sup> <sup>↑</sup><sup>m</sup>- ·n- <sup>m</sup>·<sup>n</sup> (g◦f), corresponding to monotonicity of (·) with respect to ≤.

An intuitive meaning of a graded category's morphisms is: <sup>f</sup> <sup>∈</sup> <sup>C</sup>(A, B)(m) if the value or the price of a morphism f : A → B is at most m with respect to the ordering ≤ on M. We do not yet give a polarity or direction to this price, i.e., whether the price is consumed or produced by the computation. Thus, graded categories give a non-biased view; we need not specify whether grading relates to the source or target of a morphism.

Graded categories were first introduced by Wood [54, Section 1] (under the name 'large V -categories'), and Levy connected them with models of call-bypush-value [28]. Therefore we do not claim the novelty of Definition 1.

Example 5. A major source of graded categories is via graded (co)monads. Let (M, ≤, 1, ·) be a pomonoid, regarded as a monoidal category. A graded monad [50,24] on a category C (or more precisely an M-graded monad) is a lax monoidal functor (T, η, μ):(M, <sup>≤</sup>, <sup>1</sup>, ·) <sup>→</sup> ([C, <sup>C</sup>],Id, ◦). Concretely, this specifies:


They satisfy the graded versions of the usual monad axioms:

$$\begin{array}{c} \mathop{\rm TmJ\underset{\mu\_{\mathrm{TmJ}}}{\scalebox{4.0.25pt}{\$\mu\_{\mathrm{TmJ}}}}} \mathop{\rm Tm(T1J)}{\mathop{\rm Tm}{\left(Tm'(Tm'')J\right)}} \mathop{\rm Tm(Tm'(Tm'')J)} \mathop{\rm \xrightarrow{\mu\_{\mathrm{m},m',Tm'J}}} \mathop{\rm T(m\cdot m')(Tm'')J}{\mathop{\rm Tm}{\left(m\cdot m'\right)}} \\ \mathop{\rm T1(TmJ)}{\mathop{\rm MmJ}{\longrightarrow}} \mathop{\rm TmJ}{\mathop{TmJ}{\left(T(m'\cdot m'')J\right)}} \mathop{\rm TmJ}{\mathop{Tm}{\left(T(m'\cdot m'')J\right)}} \mathop{\rm T(m\cdot m'\cdot m'')J}{\mathop{T(m\cdot m'\cdot m'')J}{\mathop{T(m\cdot m'\cdot m'')J}}} $$

Graded comonads are dually defined (i.e., as a graded monad on Cop).

By mimicking the construction of Kleisli categories, we can construct an M-graded category C<sup>T</sup> (we call it the Kleisli M-graded category of T) from a category C with an M-graded monad T on C. 6


The dual construction is possible. Let D be an Mop-graded comonad on a category C. We then define C<sup>D</sup> by CD(X, Y )(m) = C(DmX, Y ); the rest of data is similar to the case of graded monads. This yields an M-graded category CD.

Remark 1. As an aside (included for completeness but not needed in the rest of the paper), graded categories are an instance of enriched categories. For the enriching category, we take the presheaf category [M, **Set**], together with Day's convolution product [10].

<sup>6</sup> Not to be confused with the Kleisli category of graded monads by Fujii et al. [13].

### **4.1 Homogeneous Coproducts in Graded Categories**

We model boolean values and natural numbers by the binary coproduct 1 + 1 and the countable coproduct ! <sup>i</sup>∈<sup>N</sup> 1. We thus define what it means for a graded category to have coproducts. The following definition of binary coproducts easily extends to coproducts of families of objects.

**Definition 2.** Let C be an M-graded category. A homogeneous binary coproduct of <sup>X</sup>1, X<sup>2</sup> <sup>∈</sup> <sup>C</sup> consists of an object <sup>Z</sup> <sup>∈</sup> <sup>C</sup> together with injections <sup>ι</sup><sup>1</sup> <sup>∈</sup> <sup>C</sup>(X1, Z)(1) and <sup>ι</sup><sup>2</sup> <sup>∈</sup> <sup>C</sup>(X2, Z)(1) such that, for any <sup>m</sup> <sup>∈</sup> <sup>M</sup> and <sup>Y</sup> <sup>∈</sup> <sup>C</sup>, the function λf . (<sup>f</sup> ◦ι1, f ◦ι2) of type <sup>C</sup>(Z, Y )(m) <sup>→</sup> <sup>C</sup>(X1, Y )(m)×C(X2, Y )(m) is invertible. The inverse is called the cotupling and denoted by [−, −]. It satisfies the usual law of coproducts (i = 1, 2):

$$\begin{aligned} [f\_1, f\_2] \circ \iota\_i &= f\_i, & [\iota\_1, \iota\_2] &= \mathrm{id}\_Z, \\ g \circ [f\_1, f\_2] &= [g \circ f\_1, g \circ f\_2], & [\uparrow\_m^n f\_1, \uparrow\_m^n f\_2] &= \uparrow\_m^n [f\_1, f\_2]. \end{aligned}$$

When homogeneous binary coproducts of any combination of <sup>X</sup>1, X<sup>2</sup> <sup>∈</sup> <sup>C</sup> exists, we say that C has homogeneous binary coproducts.

The difference between homogeneous coproducts and coproducts in ordinary category theory is that the cotupling is restricted to take morphisms with the same grade. A similar constraint is seen in some effect systems, where the typing rule of conditional expressions require each branch to have the same effect.

**Proposition 1.** Let {ι<sup>i</sup> <sup>∈</sup> <sup>C</sup>(Xi, Z)}<sup>i</sup>∈<sup>I</sup> be a coproduct of {Xi}<sup>i</sup>∈<sup>I</sup> in an ordinary category C.


### **4.2 Graded Freyd Categories with Countable Coproducts**

We now introduce the central categorical structure of the loop language and GHL semantics: graded Freyd categories with homogeneous countable coproducts.

**Definition 3.** An M-graded Freyd category with homogeneous countable coproducts consists of the following data:


4. A function (∗)V,X,W,Y : <sup>V</sup>(V,W) <sup>×</sup> <sup>C</sup>(X, Y )(m) <sup>→</sup> <sup>C</sup>(<sup>V</sup> <sup>×</sup> X,W <sup>×</sup> <sup>Y</sup> )(m) for each V, W, X, Y <sup>∈</sup> <sup>C</sup> and <sup>m</sup> <sup>∈</sup> <sup>M</sup>. Below we use it as an infix operator and sometimes omit its subscripts. The role of this function is to combine pure computations and effectful computations in parallel.

The function I and (∗) satisfy the following equations:

$$\begin{aligned} I(\operatorname{id}\_X) = \operatorname{id}\_X &\quad I(g \circ f) = Ig \circ If &\quad I(f \times g) = f \ast Ig &\quad \operatorname{id}\_V \ast \operatorname{id}\_X = \operatorname{id}\_{V \ast X}, \\ (g \circ f) \ast (i \circ j) &= (g \ast i) \circ (f \ast j) &\quad f \ast \uparrow\_m^n g = \uparrow\_m^n (f \ast g) \\ f \circ I(l\_X) = I(l\_X) \circ (\operatorname{id}\_1 \ast f) &\quad I(a\_{X', Y', Z'}) \circ ((f \times g) \ast h) = (f \ast (g \ast h)) \circ I(a\_{X, Y, Z}) \end{aligned}$$

These are analogous to the usual Freyd categories axioms. We also require that:


We denote an M-graded Freyd category with countable coproducts by the tuple (V, <sup>1</sup>, <sup>×</sup>, <sup>C</sup>,I,(∗)) capturing the main details of the cartesian monoidal structure of <sup>V</sup>, the base category <sup>C</sup>, the lifting function <sup>I</sup> and the action (∗).

If the grading pomonoid M is trivial, C becomes an ordinary category with countable coproducts. We therefore simply call it a Freyd category with countable coproducts. This is the same as a distributive Freyd category in the sense introduced by Power [46] and Staton [51]. We will use non-graded Freyd categories to give a semantics of the loop language in Section 4.3. An advantage of Freyd categories is that they encompasses a broad class of models of computations, not limited to those arising from monads. A recent such example is Staton's category of s-finite kernels [52] 7.

We could give an alternative abstract definition of M-graded Freyd category using 2-categorical language: a graded Freyd category is an equivariant morphism in the category of actions from a cartesian category to M-graded categories. The full detail of this formulation will be discussed elsewhere.

A Freyd category typically arises from a strong monad on a cartesian category [47]. We give here a graded analogue of this fact. First, we recall the notion of strength for graded monads [24, Definition 2.5]. Let (C, <sup>1</sup>, <sup>×</sup>) be a cartesian monoidal category. A strong M-graded monad is a pair of an M-graded monad (T, η, μ) and a natural transformation stI,J,m <sup>∈</sup> <sup>C</sup>(I×TmJ, Tm(I×J)) satisfying graded versions of the four coherence laws in [34, Definition 3.2]. We dually define a costrong M-graded comonad (D, ε, δ, cs) to be the M-graded comonad equipped with the costrength csI,J,m <sup>∈</sup> <sup>C</sup>(Dm(<sup>I</sup> <sup>×</sup> <sup>J</sup>), I <sup>×</sup> DmJ).

**Proposition 2.** Let (C, <sup>1</sup>, <sup>×</sup>) be a cartesian monoidal category.

1. Let (T, η, μ,st) be a strong M-graded monad on C. The Kleisli M-graded category <sup>C</sup><sup>T</sup> , together with If <sup>=</sup> <sup>η</sup><sup>W</sup> ◦ <sup>f</sup> and <sup>f</sup> <sup>∗</sup> <sup>g</sup> = stW,Y ◦ (<sup>f</sup> <sup>×</sup> <sup>g</sup>) forms an M-graded Freyd category with homogeneous countable coproducts.

<sup>7</sup> It is not known whether the category of s-finite kernels is a Kleisli category.

2. Let (D, ε, δ, cs) be a costrong Mop-graded comonad on C such that each Dm preserves countable coproducts. Then the coKleisli M-graded category C<sup>D</sup> together with If = f ◦ ε<sup>V</sup> and f ∗ g = (f × g) ◦ csV,X forms an M-graded Freyd category with homogeneous countable coproducts.

We often use the following 'ext' operation to structure interpretations of programs and GHL derivations. Let <sup>δ</sup><sup>X</sup> <sup>∈</sup> <sup>V</sup>(X, X <sup>×</sup> <sup>X</sup>) be the diagonal morphism. Then ext : <sup>C</sup>(X, Y )(m) <sup>→</sup> <sup>C</sup>(X, X <sup>×</sup> <sup>Y</sup> )(m) is defined as ext(f)=(<sup>X</sup> <sup>∗</sup> <sup>f</sup>) ◦ IδX. When viewing X as a set of environments, ext(f) may be seen as executing an effectful procedure f under an environment, then extending the environment with the return value of f. In a non-graded setting, the definition of ext is analogous.

### **4.3 Semantics of The Loop Language in Freyd Categories**

Towards the semantics of GHL, we first give a more standard, non-graded categorical semantics of the loop language. We first prepare the following data.


$$\begin{aligned} \left[\mathsf{Bool}\right] &= \mathsf{Bool} & \left[\mathsf{tt}\right] &= \mathsf{tt} \in \mathbb{V}(1, \mathsf{Bool}) & \left[\mathsf{ff}\right] &= \mathsf{ff} \in \mathbb{V}(1, \mathsf{Bool}),\\ \left[\mathsf{nat}\right] &= \mathsf{Nat} & \left[\left[k\right]\right] &= \left[k\right] \in \mathbb{V}(1, \mathsf{Nat}). \end{aligned}$$

For convenience, we let M [[ΓM]] (Section 3.1), i.e., all relevant (mutable) program variables are in scope, and write <sup>π</sup><sup>v</sup> <sup>∈</sup> <sup>V</sup>(M, [[ΓM(v)]]) for the projection morphism associated to a program variable v ∈ ΓM.

Pure expressions are interpreted as V-morphisms and impure commands and procedures are interpreted as C-morphisms, of the form:


For the interpretation of programs, we first define some auxiliary morphisms. For all <sup>v</sup> <sup>∈</sup> <sup>Γ</sup>M, let upd<sup>v</sup> <sup>∈</sup> <sup>V</sup>(<sup>M</sup> <sup>×</sup>[[ΓM(v)]], <sup>M</sup>) to be the unique morphism (capturing memory updates) satisfying πv◦upd<sup>v</sup> = π<sup>2</sup> and πw◦upd<sup>v</sup> = πw◦π<sup>1</sup> for any w ∈ Γ<sup>M</sup> such that <sup>v</sup> <sup>=</sup> <sup>w</sup>. We define sub(v, e) <sup>∈</sup> <sup>V</sup>(M, <sup>M</sup>) by sub(v, e) upd<sup>v</sup> ◦ idM, [[e]], which updates the memory configuration at variable v with the value of e.

For the interpretation of conditional and loop commands, we need coproducts over <sup>M</sup>. Since <sup>V</sup> is distributive, we can form a binary coproduct <sup>M</sup> <sup>×</sup> Bool and a countable coproduct <sup>M</sup> <sup>×</sup> Nat with injections respectively defined as (∀<sup>k</sup> <sup>∈</sup> <sup>N</sup>):

$$\begin{aligned} \mathsf{tm} & \triangleq \langle \mathrm{id}\_{\mathsf{M}}, \mathsf{ttc} \mathsf{l}\_{\mathsf{M}} \rangle \in \mathsf{V}(\mathsf{M}, \mathsf{M} \times \mathsf{Bool}) \quad [k] \triangleq \langle \mathrm{id}\_{\mathsf{M}}, [k \,] \circ \mathsf{l}\_{\mathsf{M}} \rangle \in \mathsf{V}(\mathsf{M}, \mathsf{M} \times \mathsf{Nat}) \\ \mathsf{fm} & \triangleq \langle \mathrm{id}\_{\mathsf{M}}, \mathsf{ff} \circ \mathsf{l}\_{\mathsf{M}} \rangle \in \mathsf{V}(\mathsf{M}, \mathsf{M} \times \mathsf{Bool}) \end{aligned}$$

By Condition 1 of Definition 3, these coproducts are mapped to coproducts in C with injections:

$$\{I(\mathsf{tm}), I(\mathsf{fm}) \in \mathbb{C}(\mathsf{M}, \mathsf{M} \times \mathsf{Bool})\}, \qquad \{I([k]) \in \mathbb{C}(\mathsf{M}, \mathsf{M} \times \mathsf{Nat}) \mid k \in \mathbb{N}\}.$$

The cotuplings of these coproducts (written [f,g] and [f(k) ]<sup>k</sup>∈<sup>N</sup> respectively) are used next to interpret conditionals and loops.

We interpret a program <sup>P</sup> of the loop language as a morphism [[P]] <sup>∈</sup> <sup>C</sup>(M, <sup>M</sup>):

$$\begin{array}{c} \left[P; P'\right] = \left[P'\right] \circ \left[P\right] \\ \left[\mathsf{do}\ v \leftarrow p\right] = I(\mathsf{upd}\_v) \circ \mathsf{ext}\left[p\right] \\ \left[v:=e\right] = I(\mathsf{sub}(v,e)) \\ \left[\mathsf{if}\ e\_b \ \mathsf{then}\ P\ \mathsf{else}\ P'\right] = \left[\left[P\right], \left[P'\right]\right] \circ \mathsf{ext}(I[e\_b]) \\ \left[\mathsf{Isop}\ e\_n \ \mathsf{do}\ P\right] = \left[\left[P\right], \left[P'\right]\right] \circ \mathsf{ext}(I[e\_b]) \\ \end{array}$$

Thus, the semantics of loop e<sup>n</sup> doP is such that, if the expression e<sup>n</sup> evaluates to some natural number \*k+ then loop endoP is equivalent to the k-times sequential composition of P.

# **5 Modelling Graded Hoare Logic**

We now define the categorical model of GHL, building on the non-graded Freyd semantics of Section 4.3. Section 5.1 first models the base assertion logic, for which we use fibrations, giving an overview of the necessary mathematical machinery for completeness. Section 5.2 then defines the semantics of GHL and Section 5.3 instantiates it for the examples discussed previously in Section 3.

### **5.1 Interpretation of the Assertion Logic using Fibrations**

Our assertion logic (Section 3) has logical connectives of finite conjunctions, countable disjunctions, existential quantification and an equality predicate. A suitable categorical model for this fragment of first-order logic is offered by a coherent fibration [22, Def. 4.2.1], extended with countable joins in each fibre. We recap various key definitions and terminology due to Jacobs' textbook [22].

In the following, let <sup>P</sup> and <sup>V</sup> be categories and <sup>p</sup> : <sup>P</sup> <sup>→</sup> <sup>V</sup> a functor.

We can regard functor p as attaching predicates to each object in V. When pψ <sup>=</sup> <sup>X</sup>, we regard <sup>ψ</sup> <sup>∈</sup> <sup>P</sup> as a predicate over <sup>X</sup> <sup>∈</sup> <sup>V</sup>. When <sup>f</sup> <sup>∈</sup> <sup>P</sup>(ψ, φ) is a morphism, we regard this as saying that pf maps elements satisfying ψ to those satisfying φ in V. Parallel to this view of functors assigning predicates is the notion that entities in P are 'above' those in V when they are mapped to by p.

**Definition 4 ('Aboveness').** An object <sup>ψ</sup> <sup>∈</sup> <sup>P</sup> is said to be above an object <sup>X</sup> <sup>∈</sup> <sup>V</sup> if pψ <sup>=</sup> <sup>X</sup>. Similarly, a morphism<sup>8</sup> ˙<sup>f</sup> <sup>∈</sup> <sup>P</sup>(ψ, φ) is said to be above <sup>a</sup> morphism <sup>f</sup> in <sup>V</sup> if <sup>p</sup> ˙<sup>f</sup> <sup>=</sup> <sup>f</sup> <sup>∈</sup> <sup>V</sup>(pψ, pφ). A morphism in <sup>P</sup> is vertical if it is above an identity morphism. Given ψ, φ <sup>∈</sup> <sup>P</sup> and <sup>f</sup> <sup>∈</sup> <sup>V</sup>(pψ, pφ), then we denote the set of all morphisms in <sup>P</sup> above <sup>f</sup> as <sup>P</sup><sup>f</sup> (ψ, φ) = { ˙<sup>f</sup> <sup>∈</sup> <sup>P</sup>(ψ, φ) <sup>|</sup> <sup>p</sup> ˙<sup>f</sup> <sup>=</sup> <sup>f</sup>}.

<sup>8</sup> The dot notation here introduces a new name and should not be understood as applying some mathematical operator on f.

**Definition 5 (Fibre category).** <sup>A</sup> fibre category over <sup>X</sup> <sup>∈</sup> <sup>V</sup> is a subcategory of P consisting of objects above X and morphisms above idX. This subcategory is denoted by PX, and thus the homsets of P<sup>X</sup> are PX(ψ, φ) = Pid<sup>X</sup> (ψ, φ).

We are ready to recall the central concept in fibrations: cartesian morphisms.

**Definition 6 (Cartesian morphism).** A morphism ˙<sup>f</sup> <sup>∈</sup> <sup>P</sup>(ψ, φ) is cartesian if for any <sup>α</sup> <sup>∈</sup> <sup>P</sup> and <sup>g</sup> <sup>∈</sup> <sup>V</sup>(pα, pψ), the post-composition of ˙<sup>f</sup> in <sup>P</sup>, regarded as a function of type ˙<sup>f</sup> ◦ − : <sup>P</sup>g(α, ψ) <sup>→</sup> <sup>P</sup><sup>g</sup>◦pf˙(α, φ), is a bijection. This amounts to the following universal property of cartesian morphism: for any <sup>h</sup>˙ <sup>∈</sup> <sup>P</sup>(α, φ) above <sup>g</sup> ◦ pf, there exists a unique morphism ˙<sup>g</sup> <sup>∈</sup> <sup>P</sup>(α, ψ) above <sup>g</sup> such that <sup>h</sup>˙ <sup>=</sup> ˙<sup>f</sup> ◦ <sup>g</sup>˙. Intuitively, ˙<sup>f</sup> represents the situation where <sup>ψ</sup> is a pullback or inverse image of φ along p ˙f, and the universal property corresponds to that of pullback.

**Definition 7 (Fibration).** Finally, a functor <sup>p</sup> : <sup>P</sup> <sup>→</sup> <sup>V</sup> is a fibration if for any <sup>ψ</sup> <sup>∈</sup> <sup>P</sup>, <sup>X</sup> <sup>∈</sup> <sup>V</sup>, and <sup>f</sup> <sup>∈</sup> <sup>V</sup>(X, pψ), there exists an object <sup>φ</sup> <sup>∈</sup> <sup>P</sup> and a cartesian morphism ˙<sup>f</sup> <sup>∈</sup> <sup>P</sup>(φ, ψ) above <sup>f</sup>, called the cartesian lifting of <sup>f</sup> with <sup>ψ</sup>. We say that a fibration <sup>p</sup> : <sup>P</sup> <sup>→</sup> <sup>V</sup> is posetal if each <sup>P</sup><sup>X</sup> is a poset, corresponding to the implicational order between predicates. When <sup>ψ</sup> <sup>≤</sup> <sup>φ</sup> holds in <sup>P</sup>X, we denote the corresponding vertical morphism in <sup>P</sup> as <sup>ψ</sup> - <sup>φ</sup>.

Posetal fibrations are always faithful. The cartesian lifting of <sup>f</sup> <sup>∈</sup> <sup>V</sup>(X, pψ) with ψ uniquely exists. We thus write it by fψ, and its domain by f <sup>∗</sup>ψ. It can be easily shown that for any morphism <sup>f</sup> <sup>∈</sup> <sup>V</sup>(X, Y ) in <sup>V</sup>, the assignment <sup>ψ</sup> <sup>∈</sup> <sup>P</sup><sup>Y</sup> → <sup>f</sup> <sup>∗</sup><sup>ψ</sup> <sup>∈</sup> <sup>P</sup><sup>X</sup> extends to a monotone function <sup>f</sup> <sup>∗</sup> : <sup>P</sup><sup>Y</sup> <sup>→</sup> <sup>P</sup>X. We call it the reindexing function (along f). Furthermore, the assignment f → f <sup>∗</sup> satisfies the (contravariant) functoriality: id<sup>∗</sup> <sup>X</sup> = id<sup>P</sup><sup>X</sup> and (g ◦f)<sup>∗</sup> = f <sup>∗</sup> ◦g∗. A fibration is <sup>a</sup> bifibration if each reindexing function <sup>f</sup> <sup>∗</sup> : <sup>P</sup><sup>Y</sup> <sup>→</sup> <sup>P</sup><sup>X</sup> for <sup>f</sup> <sup>∈</sup> <sup>V</sup>(X, Y ) has a left adjoint, denoted by <sup>f</sup><sup>∗</sup> : <sup>P</sup><sup>X</sup> <sup>→</sup> <sup>P</sup><sup>Y</sup> . <sup>f</sup>∗<sup>ψ</sup> is always associated with a morphism fψ : f∗ψ → ψ above f, and this is called the opcartesian lifting of f with ψ. For the universal property of the opcartesian lifting, see Jacobs [22, Def. 9.1.1].

Fibrations for our Assertion Logic It is widely known that coherent fibrations are suitable for interpreting the ∧, ∨, ∃, =-fragment of first-order logic (see [22, Chapter 4, Def. 4.2.1]). Based on this fact, we introduce a class of fibrations that are suitable for our assertion logic—due to the countable joins of the assertion logic we modify the definition of coherent fibration accordingly.

**Definition 8.** A fibration for assertion logic over V is a posetal fibration p : <sup>P</sup> <sup>→</sup> <sup>V</sup> for cartesian <sup>V</sup> with distributive countable coproducts, such that:


4. The reindexing function w∗ X,Y along the weakening <sup>w</sup>X,Y π<sup>1</sup> <sup>∈</sup> <sup>V</sup>(<sup>X</sup> <sup>×</sup>Y,X) has a left adjoint ∃X,Y . w<sup>∗</sup> X,Y . This satisfies Beck-Chevalley condition and Frobenius property; we refer [22, Definition 1.9.1, 1.9.12].

This is almost the same as the definition of coherent fibrations [22, Definition 4.2.1]; the difference is that 1) the base category V has countable coproducts 2) we require each fibre to be a poset; this makes object equalities hold on-the-nose, and 3) we require each fibre to have countable joins. They will be combined with countable coproducts of V to equip P with a countable coproduct [22].

Example 6. A typical example of a fibration for assertion logic is the subobject fibration <sup>p</sup>**Set** : **Pred** <sup>→</sup> **Set**; the category **Pred** has objects pairs (X, ψ) of sets such that ψ ⊆ X, and morphisms of type (X, ψ) → (Y,φ) as functions f : X → Y such that f(ψ) ⊆ φ. The functor p sends (X, ψ) to X and f to itself. More examples can be found in the work of Jacobs [22, Section 4].

For a parallel pair of morphisms f,g <sup>∈</sup> <sup>V</sup>(X, Y ), we define the equality predicate Eq(f,g) above X to be idX,f,g<sup>∗</sup>EqX,Y (<sup>X</sup>×<sup>Y</sup> ) [22, Notation 3.4.2]. Intuitively, Eq(f,g) corresponds to the predicate {x ∈ X | f(x) = g(x)}. In this paper, we will use some facts about the equality predicate shown by Jacobs [22, Proposition 3.4.6, Lemma 3.4.5, Notation 3.4.2, Example 4.3.7].

**The Semantics of Assertion Logic** We move to the semantics of our assertion logic in a fibration <sup>p</sup> : <sup>P</sup> <sup>→</sup> <sup>V</sup> for assertion logic. The basic idea is to interpret a formula <sup>ψ</sup> <sup>∈</sup> **Fml**Σ<sup>l</sup> (Γ) as an object in <sup>P</sup>[[Γ]], and an entailment <sup>Γ</sup> <sup>|</sup> <sup>ψ</sup> <sup>φ</sup> as the order relation [[ψ]] <sup>≤</sup> [[φ]] in <sup>P</sup>[[Γ]]. The semantics is given by the following interpretation of the data specifying the assertion logic (given in Section 3.3):


The interpretation [[ψ]] of <sup>ψ</sup> <sup>∈</sup> **Fml**<sup>Σ</sup><sup>l</sup> (Γ) is inductively defined as a <sup>P</sup>[[Γ]]-object:

$$\begin{aligned} \{P(t\_1, \dots, t\_n)\} &= \langle [t\_1], \dots, [t\_n] \rangle^\* [P] & \quad [t = u] = \text{Eq}([t], [u])\\ \{\bigwedge \psi\_i\} &= \bigwedge [\psi\_i] & \quad [\bigvee \psi\_i] = \bigvee [\psi\_i] & \quad [\exists x : s \ . \end{aligned}$$

### **5.2 Interpretation of Graded Hoare Logic**

We finally introduce the semantics of Graded Hoare logic. This semantics interprets derivations of GHL judgements <sup>m</sup> {ψ} P {φ} as m-graded morphisms in a graded category. Moreover, it is built above the interpretation [[P]] <sup>∈</sup> <sup>C</sup>(M, <sup>M</sup>) of the program P in the non-graded semantics introduced in Section 4.3. The underlying structure is given as a combination of a fibration for the assertion logic and a graded category over C, as depicted in (1) (Section 2, p. 237).

**Definition 9.** <sup>A</sup> GHL structure over a Freyd category (V, <sup>1</sup>, <sup>×</sup>, <sup>C</sup>,I, <sup>∗</sup>) with countable coproducts and a fibration <sup>p</sup> : <sup>P</sup> <sup>→</sup> <sup>V</sup> for assertion logic comprises:


The above data satisfy the following properties:

1. That q behaves 'functorialy' preserving structure from E to V:

$$\begin{array}{ll} q(\mathrm{id}\_{\phi}) = \mathrm{id}\_{p\phi}, & q(g \circ f) = qg \circ qf, \quad q\_{\psi, \phi, n}(\uparrow\_{m}^{n} f) = q\_{\psi, \phi, m} f \\ q(\dot{I}f) = I(pf), & q(f \circledast g) = pf \ast qg \end{array}$$


The last statement asserts that if the precondition is the least element ⊥<sup>X</sup> in the fibre over <sup>X</sup> <sup>∈</sup> <sup>V</sup>, which represents the false assertion, we trivially conclude any postcondition <sup>φ</sup> and grading <sup>m</sup> for any morphisms of type <sup>X</sup> <sup>→</sup> pφ in <sup>C</sup>.

The semantics of GHL then requires a graded Freyd category with countable coproducts, and morphisms in the graded category guaranteeing a sound model of the effectful primitives (commands/procedures), captured by the data:


where [[c]], [[p]] and later [[e]] are from the underlying non-graded model (Sec. 4.3). We interpret a derivation of GHL judgement <sup>m</sup> {φ} P {ψ} as a morphism

$$\left[\vdash\_m \{\phi\} \, P \, \{\psi\}\right] \in \mathbb{E}(\left[\phi\right], \left[\psi\right]) (m) \text{ such that } \begin{array}{l} q\_{\left[\phi\right], \left[\psi\right], m} \left[\vdash\_m \{\phi\} \, P \, \{\psi\}\right] = \left[P\right]. \end{array}$$

The constraint on the right is guaranteed by the soundness of the interpretation (Theorem 1). From the functor-as-refinement viewpoint [32], the interpretation [[<sup>m</sup> {φ} P {ψ}]] witnesses that [[P]] respects refinements φ and ψ of M, and additionally it witnesses the grade of [[P]] being m. We first cover the simpler cases of the interpretation of GHL derivations:

$$\begin{aligned} \left[\vdash\_1\left\{\psi\right\}\operatorname{skip}\left\{\psi\right\}\right] &= \operatorname{id}\_{\left[\psi\right]}\\ \left[\vdash\_{m\_1,m\_2}\left\{\psi\right\}\right]P\_1\left\{\!\_1P\_2\left\{\!\_\theta\right\}\right] &= \left[\vdash\_{m\_2}\left\{\psi\_1\right\}\,P\_2\left\{\!\_\theta\right\}\right] \circ \left[\vdash\_{m\_1}\left\{\psi\right\}\,P\_1\left\{\!\_\psi\right\}\right] \\ \left[\vdash\_1\left\{\psi\middle|e/v\right\}\right]v &:= e\left\{\psi\right\}\| = \dot{I}\left(\overline{\operatorname{sub}}(v,e)\|\psi\|\right) \\ \left[\vdash\_m\left\{\psi\right\}\,\operatorname{do}\,e\left\{\psi\right\}\right] &= \dot{I}(\pi\_1)\operatorname{o}\,\operatorname{ext}\langle c\rangle \\ \left[\vdash\_m\left\{\psi\right\}\,\operatorname{do}\,v \leftarrow p\left\{\left(\exists v\ \cdot\right)\psi\right\}\right] &= \dot{I}(\underline{\operatorname{up}}\,\operatorname{d}\_v(\left[\!\!\!\!\/\psi\right]\times\left[\!\!\!\!\!\right]) \circ \operatorname{ext}\langle p\rangle \\ \left[\vdash\_m\left\{\psi'\right\}\,P\left\{\!\_\theta\left'\right\}\right] &= \dot{I}(\left[\phi\right]\,\bigwedge^\gamma\left[\psi'\right]) \circ \gamma\_m^{m'}\left[\vdash\_m\left\{\psi\right\}\,P\left\{\!\_\theta\right\}\right] \circ \dot{I}(\left[\!\!\!\!\!\right]) \end{aligned}$$

The morphisms with upper and lower lines are cartesian liftings and op-cartesian liftings in the fibration <sup>p</sup> : <sup>P</sup> <sup>→</sup> <sup>V</sup> of the assertion logic. The codomain of the interpretation of the procedure call do v ← p is equal to [[(∃v.ψ) ∧ φ]].

The above interpretations largely follow the form of the underlying model of Section 4.3, with the additional information and underlying categorical machinery for grades and assertions here; we now map to E. The interpretation of conditional and loop commands requires some more reasoning.

Conditionals Let p1, p<sup>2</sup> be the interpretations of each branch of the conditional command:

$$\begin{aligned} p\_1 &= \left[ \vdash\_m \left\{ \psi \wedge e\_b = \mathtt{tt} \right\} P\_1 \left\{ \phi \right\} \right] \in \mathbb{E}(\left[ \psi \wedge e\_b = \mathtt{tt} \right], \left[ \phi \right]) (m) \\ p\_2 &= \left[ \vdash\_m \left\{ \psi \wedge e\_b = \mathtt{ff} \right\} P\_2 \left\{ \phi \right\} \right] \in \mathbb{E}(\left[ \psi \wedge e\_b = \mathtt{ff} \right], \left[ \phi \right]) (m) \end{aligned}$$

We consider the cocartesian lifting idM, [[eb]][[ψ]] : [[ψ]] → idM, [[eb]]∗[[ψ]]. We name its codomain Im. Next, cartesian morphisms tm(Im) : tm∗Im → Im and fm(Im) : fm∗Im <sup>→</sup> Im in <sup>P</sup> are above the coproduct (M×Bool,tm, fm) in <sup>V</sup>. Then the interpretations of the preconditions of P1, P<sup>2</sup> are inverse images of Im along tm, fm : M → M × Bool:

**Lemma 1.** [[ψ ∧ e<sup>b</sup> = tt]] = tm∗Im and [[ψ ∧ e<sup>b</sup> = ff]] = fm∗Im.

The side condition of the conditional rule ensures that (Im,tm(Im), fm(Im)) is a coproduct in P:

**Lemma 2.** Γ<sup>M</sup> | ψ e<sup>b</sup> = tt ∨ e<sup>b</sup> = ff implies Im = tm∗tm∗Im ∨ fm∗fm∗Im.

Therefore the image of the coproduct (Im,tm(Im), fm(Im)) by ˙ I yields a homogeneous coproduct in <sup>E</sup>. We take the cotupling [p1, p2] <sup>∈</sup> <sup>E</sup>(Im, [[φ]])(m) with respect to this homogeneous coproduct. We finally define the interpretation of the conditional rule to be the following composite:

$$\mathbb{E}\left[\vdash\_m \{\psi\} \mathtt{if} \ e\_b \mathtt{ then} \, P\_1 \mathtt{else} \, P\_2 \{\phi\} \right] = [p\_1, p\_2] \circ \dot{I}(\underbrace{\{\operatorname{id}\_{\mathsf{M}}, [e\_b] \} \{\psi\}) \in \mathbb{E}(\[\psi\}, [\phi])(m).$$

Loops Fix <sup>N</sup> <sup>∈</sup> <sup>N</sup>, and suppose that <sup>m</sup> {ψ<sup>i</sup>+1}<sup>P</sup> {ψi} is derivable in the graded Hoare logic for each 0 <sup>≤</sup> i<N. Let <sup>p</sup><sup>i</sup> <sup>∈</sup> <sup>E</sup>([[ψ<sup>i</sup>+1]], [[ψi]])(m) be the interpretation [[<sup>m</sup> {ψ<sup>i</sup>+1} P<sup>i</sup> {ψi}]]. We then define a countable family of morphisms (we use here ex falso quodlibet):

$$b\_i = \begin{cases} q\_{\perp \mathcal{M}, \{\psi\_0\}, m^N}^{-1}(\lbrack P \rbrack^{(i)}) \in \mathbb{E}(\perp\_{\mathcal{M}}, \lbrack \psi\_0 \rbrack)(m^N) & (i \neq N) \\ p\_0 \circ \cdots \circ p\_N & \in \mathbb{E}(\lbrack \psi\_N \rbrack, \lbrack \psi\_0 \rbrack)(m^N) & (i = N) \end{cases}$$

Let θ<sup>i</sup> cod(bi). Then ! <sup>i</sup>∈<sup>N</sup> <sup>θ</sup><sup>i</sup> <sup>=</sup> <sup>i</sup>∈<sup>N</sup>[i]∗θ<sup>i</sup> = [N]∗[[ψ<sup>N</sup> ]] because [i]∗θ<sup>i</sup> is either ⊥<sup>M</sup>×Nat or [N]∗[[ψ<sup>N</sup> ]]. We then send the coproduct θ<sup>i</sup> → ! <sup>i</sup>∈<sup>N</sup> <sup>θ</sup><sup>i</sup> by ˙ I and obtain a homogeneous coproduct in E. By taking the cotupling of all b<sup>i</sup> with this homogeneous coproduct, we obtain a morphism [bi]<sup>i</sup>∈<sup>N</sup> <sup>∈</sup> <sup>E</sup>([N]∗[[ψ<sup>N</sup> ]], [[ψ0]])(m<sup>N</sup> ).

**Lemma 3.** Γ<sup>M</sup> | ψ<sup>N</sup> e<sup>n</sup> = \*N+ implies idM, [[e<sup>N</sup> ]]∗[[ψ<sup>N</sup> ]] = [N]∗[[ψ<sup>N</sup> ]].

We then define [[<sup>m</sup><sup>N</sup> {ψ<sup>N</sup> } loop <sup>e</sup><sup>n</sup> do <sup>P</sup> {ψ0}]] = [bi]<sup>i</sup>∈<sup>N</sup> ◦ ˙ I(idM, [[en]][[ψ<sup>N</sup> ]]).

**Theorem 1 (Soundness of GHL).** For any derivation of a GHL judgement <sup>m</sup> {φ} P {ψ}, we have q[[φ]],[[ψ]],m[[<sup>m</sup> {φ} P {ψ}]] = [[P]].

### **5.3 Instances of Graded Hoare Logic**

We first present a construction of GHL structures from graded monad liftings, which are a graded version of the concept of monad lifting [11,19,26].

**Definition 10.** [Graded Liftings of Monads] Consider two cartesian categories <sup>E</sup> and <sup>C</sup> and a functor <sup>q</sup> : <sup>E</sup> <sup>→</sup> <sup>C</sup> strictly preserving finite products. We say that a strong M-graded monad (T , ˙ η,˙ μ˙ m,m- ,st˙ <sup>m</sup>) on E is an M-graded lifting of a strong monad (T,η<sup>T</sup> , μ,st) on <sup>C</sup> along <sup>q</sup> if <sup>q</sup> ◦ Tm˙ <sup>=</sup> <sup>T</sup> ◦ <sup>q</sup>, <sup>q</sup>( ˙ηψ) = <sup>η</sup>qψ, q( ˙μm,m-,ψ) = <sup>μ</sup>qψ, <sup>q</sup>(T˙(m<sup>1</sup> <sup>≤</sup> <sup>m</sup>2)ψ) = id, <sup>q</sup>(st˙ ψ,φ,m) = stqψ,qφ.

**Theorem 2.** Let V be cartesian category with distributive countable coproducts, and let <sup>p</sup>: <sup>P</sup> <sup>→</sup> <sup>V</sup> be a fibration for assertion logic. Let <sup>T</sup> be a strong monad on V and T˙ be an M-graded lifting of T along p. Then the M-graded Freyd category (P, <sup>1</sup>, <sup>×</sup>˙ , <sup>P</sup>T˙ , J, ) with homogeneous countable coproducts, together with the function <sup>q</sup>ψ,φ,m : <sup>P</sup>T˙(ψ, φ)(m) <sup>→</sup> <sup>V</sup><sup>T</sup> (pψ, pφ) defined by <sup>q</sup>ψ,φ,m(f) = pf is a GHL structure over (V, <sup>1</sup>, <sup>×</sup>, <sup>V</sup><sup>T</sup> ,I, <sup>∗</sup>) and <sup>p</sup>.

Before seeing examples, we introduce a notation and fibrations for the assertion logic. Let <sup>p</sup> : <sup>P</sup> <sup>→</sup> <sup>V</sup> be a fibration for the assertion logic. Below we use the following notation: for <sup>f</sup> <sup>∈</sup> <sup>V</sup>(I,J) and <sup>ψ</sup> <sup>∈</sup> <sup>P</sup><sup>I</sup> and <sup>φ</sup> <sup>∈</sup> <sup>P</sup><sup>J</sup> , by <sup>f</sup> : <sup>ψ</sup> <sup>→</sup>˙ <sup>φ</sup> we mean the statement "there exists a morphism ˙<sup>f</sup> <sup>∈</sup> <sup>P</sup>(ψ, φ) such that <sup>p</sup> ˙<sup>f</sup> <sup>=</sup> <sup>f</sup>". Such ˙<sup>f</sup> is unique due to the faithfulness of <sup>p</sup> : <sup>P</sup> <sup>→</sup> <sup>V</sup>.

Example 7 (Example 4: Union Bound Logic). To derive the GHL structure suitable for the semantics of the Union Bound Logic discussed in Example 4, we invoke Theorem <sup>2</sup> by letting <sup>p</sup> be <sup>p</sup>**Set** : **Pred** <sup>→</sup> **Set** (Example 6), <sup>T</sup> be the subdistribution monad <sup>D</sup> and <sup>T</sup>˙ be the strong (R≥<sup>0</sup>, <sup>≤</sup>, <sup>0</sup>, +)-graded lifting <sup>U</sup> of <sup>D</sup> defined by <sup>U</sup>(δ)(X, P)(D(X), {<sup>d</sup> <sup>|</sup> <sup>d</sup>(<sup>X</sup> \P) <sup>≤</sup> <sup>δ</sup>}). The induced GHL structure is suitable for the semantics of GHL for Union Bound Logic in Example 4. The soundness of inference rules follow from the GHL structure as we have showed in Section 5.2. To complete the semantics of GHL for the Union Bound Logic, we give the semantics p of procedures <sup>p</sup> <sup>∈</sup> <sup>C</sup><sup>s</sup> <sup>p</sup> . Example 4 already gave a semantic condition for these operators:

$$\begin{aligned} &C\_{\mathsf{p}}^{s}(\phi,\beta,\psi) \\ &= \{ \mathsf{samp1e}\_{\mu,e} \mid \forall s.s \in [\phi] \implies \mathsf{Pr}\_{s' \leftarrow \{ \mathsf{samp1e}\_{\mu,e} \}(s)}[s' \in [\neg\psi]] \le \beta \} \\ &= \{ \mathsf{samp1e}\_{\mu,e} \mid \{ \mathsf{samp1e}\_{\mu,e} \} \in \mathsf{Pred}\_{\mathsf{d}}([\phi], [\psi])(\beta) \} \end{aligned}$$

For any sampleμ,e ∈ Cp(φ, β, ψ), the interpretation sampleμ,e is [[sampleμ,e]].

Example 8 (Example 3: Program Counter Security). To derive the GHL structure suitable for GHL with program counter security, we invoke Theorem 2 with:


The derived GHL structure is suitable for the semantics of GHL in Example 3. To complete the structure of the logic, we need to interpret two commands cfTT, cfFF ∈ **CExp** and set the axioms of commands Cc.

First [[cfTT]], [[cfFF]]: [[M]] → 1 in **ERel**W˙ are defined by [[cfTT]] ≡ (∗, tt) and [[cfFF]] ≡ (∗, ff). Finally, we define C<sup>c</sup> by (recall ≤ is prefix ordering of strings):

$$C\_{\mathbf{c}}(\psi, \sigma) = \{\mathbf{cfTT} \mid \mathbf{tt} \le \sigma\} \cup \{\mathbf{cfFF} \mid \mathbf{ff} \le \sigma\}.$$

Note, the graded lifting W˙ <sup>s</sup>σ relates only the pair of (x, σ ) and (y, σ ) with common strings of control flow. Hence, the derivation of proof tree of this logic forces the target program to have the same control flow under the precondition.

Example 9 (GHL Structure from the product comonad). In the category **Set**, the functor CX <sup>X</sup> <sup>×</sup> <sup>N</sup> forms a coproduct-preserving comonad called the product comonad. The right adjoint I : **Set** → **Set**<sup>C</sup> of the coKleisli resolution of C yields a Freyd category with countable coproducts. We next introduce a (N, <sup>≤</sup>, 0 max) graded lifting <sup>C</sup>˙ of the comonad <sup>C</sup> along the fibration <sup>p</sup>**Set** : **Pred** <sup>→</sup> **Set**. It is defined by Cn˙ (X, P) (CX, {(x, m) <sup>∈</sup> <sup>X</sup> <sup>×</sup> <sup>N</sup> <sup>|</sup> <sup>x</sup> <sup>∈</sup> P,m <sup>≥</sup> <sup>n</sup>}). Similarly, we give an (N, <sup>≤</sup>, 0 max)-graded Freyd category (J, ) induced by the graded lifting C˙ . In this way we obtain a GHL structure.

By instantiating GHL with the above GHL structure, we obtain a program logic useful for reasoning about security levels. For example, when program P<sup>1</sup> requires security level 3 and P<sup>2</sup> requires security level 7, the sequential composition P1; P<sup>2</sup> requires the higher security level 7 (= max(3, 7)).

We give a simple structure for verifying security levels determined by memory access. Fix a function VarLV: dom(ΓM) <sup>→</sup> <sup>N</sup> assigning security levels to variables. For any expression e, we define its required security level SecLV(e) = sup{VarLV(x) | x ∈ FV(e)}. Using this, for each expression e of sort s ∈ S we introduce a procedure secr<sup>e</sup> ∈ **PExp**<sup>s</sup> called secured expression. It returns the value of e if the level is high enough, otherwise it returns a meaningless contant:

[[secre]](n, ξ) = if n ≥ SecLV(e) then [[e]](ξ) else a fixed constant cs.

Secured expressions can be introduced through the following Cp:

$$C^s\_\mathbf{p}(\phi, l, \psi) = \{ \mathbf{secr}\_e \mid e: s, [\mathbf{secr}\_e] \in \mathbf{Pred}\_C([\phi], [\psi])(l), \mathbf{Secl}\mathbb{V}(e) \le l\}.$$

The pomonoid (N, <sup>≤</sup>, <sup>0</sup>, max) in the above can also be replaced with a join semilattice with a least element (Q, ≤, ⊥, ∨). Thus, GHL can be instantiated to a graded comonadic model of security and its associated reasoning.

# **6 Related Work**

Several works have studied abstract semantics of Hoare Logic. Martin et al. [31] give a categorical framework based on traced symmetric monoidal closed categories. They also show that their framework can handle extensions such as separation logic. However their framework does not directly model effects and it cannot accommodate grading as is. Goncharov and Shr¨oder [18] study a Hoare Logic to reason in a generic way about programs with side effects. Their logic and underlying semantics is based on an order-enriched monad and they show a relative completeness result. Similarly, Hasuo [20] studies an abstract weakest precondition semantics based on order-enriched monad. A similar categorical model has also been used by Jacobs [23] to study the Dijkstra monad and the Hoare monad. In the logic by Goncharov and Shr¨oder [18] effects are encapsulated in monadic types, while the weakest precondition semantics by Hasuo [20] and the semantics by Jacobs [23] have no underlying calculus. Moreover, none of them is graded. Maillard et al. [29] study a semantics framework based on the Dijkstra monad for program verification. Their framework enables reasoning about different side effects and it separates specification from computation. Their Dijkstra monad has a flavor of grading but the structure they use is more complex than a pomonoid. Maillard et al. [30] focus on relational program logics for effectful computations. They show how these logics can be derived in a relational dependent type theory, but their logics are not graded.

As we discussed in the introduction, several works have used grading structures similar to the one we study in this paper, although often with different names. Katsumata studied monads graded by a pomonoid as a semantic model for effects system [24]. A similar approach has also been studied elsewhere [36,42]. Formal categorical properties of graded monads are pursued by Fujii et al. [13]. Zhang defines a notion of graded category, but it differs to ours, and is instead closer to a definition of a graded monad [55]. As we showed in Section 4, graded categories can be constructed both by monads and comonads graded by a pomonoid, and it can also capture graded structures that do not arise from either of them. Milius et al. [33] also studied monads graded by a pomonoid in the context of trace semantics where the grading represents a notion of depth corresponding to trace length. Exploring whether there is a generalization of our work to traces is an interesting future work.

Various works study comonads graded with a semiring structure as a semantic model of contextual computations captured by means of type systems [7,16,44]. In contrast, our graded comonads are graded by a pomonoid. The additive structure of the semiring in those works is needed to merge the gradings of different instances of the same variable. This is natural for the λ-calculus where the context represent multiple inputs, but there is only one conclusion (output). Here instead, we focus on an imperative language. So, we have only one input, the starting memory, and one output, the updated memory. Therefore, it is natural to have just the multiplicative structure of the semiring as a pomonoid. The categorical axiomatics of semiring-graded comonads are studied by Katsumata from the double-category theoretic perspective [25].

Apart from graded monads, several generalizations of monads has been proposed. Atkey introduces parameterized monads and corresponding parameterized Freyd categories [1], demonstrating that parameterized monads naturally model effectful computations with preconditions and postconditions. Tate defines productors with composability of effectful computations controlled by a relational 'effector' structure [53]. Orchard et al. define category-graded monads, generalizing graded and parameterised monads via lax functors and sketch a model of Union Bound Logic in this setting (but predicates and graded-predicate interaction are not modelled, as they are here) [41]. Interesting future work is to combine these general models of computational effects with Hoare logic.

# **7 Conclusion**

We have presented a Graded Hoare Logic as a parameterisable framework for reasoning about programs and their side effects, and studied its categorical semantics. The key guiding idea is that grading can be seen as a refinement of effectful computations. This has brought us naturally to graded categories but to fully internalize this refinement idea we further introduced the new notion of graded Freyd categories. To show the generality of our framework we have shown how different examples are naturally captured by it.

We conclude with some reflections on possible future work.

Future work Carbonneaux et al. present a quantitative verification approach for amortized cost analysis via a Hoare logic augmented with multivariate quantities associated to program variables [8]. Judgments {Γ; Q}S{Γ ; Q } have pre- and post-conditions Γ and Γ and potential functions Q and Q . Their approach can be mapped to GHL with a grading monoid representing how the potential functions change. However, the multivariate nature of the analysis requires a more fine-grained connection between the structure of the memory and the structure of grades, which have not been developed yet. We leave this for future work.

GHL allows us to capture the dependencies between assertions and grading that graded program logics usually use. However, some graded systems (e.g. [4]) use more explicit dependencies by allowing grade variables—which are also used for grading polymorphism. We plan to explore this direction in future work.

The setting of graded categories in this work subsumes both graded monads and graded comonads and allows flexibility in the model. However, most of our examples in Section 5.3 are related to graded monads. The literature contains various graded comonad models of data-flow properties: like liveness analysis [44], sensitivities [7], timing and scheduling [16], and information-flow control [40]. Future work is to investigate how these structures could be adopted to GHL for reasoning about programs.

Acknowledgements Katsumata and Sato carried out this research supported by ERATO HASUO Metamathematics for Systems Design Project (No. JPM-JER1603), JST. Orchard is supported by EPSRC grant EP/T013516/1. Gaboardi is supported by the National Science Foundation under Grant No. 2040222.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Do Judge a Test by its Cover Combining Combinatorial and Property-Based Testing** *-*

Harrison Goldstein(-)1 , John Hughes<sup>2</sup> , Leonidas Lampropoulos<sup>3</sup> , and Benjamin C. Pierce<sup>1</sup>

<sup>1</sup> University of Pennsylvania, Philadelphia PA 19104, USA <sup>2</sup> Chalmers University of Technology and Quviq AB, 412 96 Gothenburg, Sweden <sup>3</sup> University of Maryland, College Park MD 20742, USA

**Abstract.** Property-based testing uses randomly generated inputs to validate high-level program specifications. It can be shockingly effective at finding bugs, but it often requires generating a very large number of inputs to do so. In this paper, we apply ideas from combinatorial testing, a powerful and widely studied testing methodology, to modify the distributions of our random generators so as to find bugs with fewer tests. The key concept is combinatorial coverage, which measures the degree to which a given set of tests exercises every possible choice of values for every small combination of input features.

In its "classical" form, combinatorial coverage only applies to programs whose inputs have a very particular shape—essentially, a Cartesian product of finite sets. We generalize combinatorial coverage to the richer world of algebraic data types by formalizing a class of sparse test descriptions based on regular tree expressions. This new definition of coverage inspires a novel combinatorial thinning algorithm for improving the coverage of random test generators, requiring many fewer tests to catch bugs. We evaluate this algorithm on two case studies, a typed evaluator for System F terms and a Haskell compiler, showing significant improvements in both.

**Keywords:** Combinatorial testing, Combinatorial coverage, QuickCheck, Property-based testing, Regular tree expressions, Algebraic data types

# **1 Introduction**

Property-based testing, popularized by tools like QuickCheck [7], is a principled way of testing software that focuses on functional specifications rather than suites of input-output examples. A property is a formula like

∀x. P(x, f(x)),

hgo@seas.upenn.edu

*<sup>-</sup>* For the full version, including all appendices, visit https://harrisongoldste.in/ papers/quick-cover.pdf. -

c The Author(s) 2021

<sup>-</sup>N. Yoshida (Ed.): ESOP 2021, LNCS 12648, pp. 264–291, 2021. https://doi.org/10.1007/978-3-030-72019-3 10

where f is the function under test and P is some executable logical relationship between an input x and the output f(x). The test harness generates random values for x, hoping to either uncover a counterexample—an x for which ¬P(x, f(x))), indicating a bug—or else provide confidence that f is correct with respect to P.

With a well-designed random test case generator, property-based testing has a non-zero probability of generating every valid test case (up to a given size limit); property-based testing is thus guaranteed to find any bug that can be provoked by an input below the size limit... eventually. Unfortunately, since each input is generated independently, random testing may end up repeating the same or similar tests many times before happening across the specific input which provokes a bug. This poses a particular problem in settings like continuous integration, where feedback is needed quickly—it would be nice to have an automatic way to guide the generator to a more interesting and diverse set of inputs, "thinning" the distribution to find bugs with fewer tests.

Combinatorial testing, an elegant approach to testing from the software engineering literature [2, 16, 17], offers an attractive metric for judging which tests are most interesting. In its classical presentation, combinatorial testing advocates choosing tests to maximize t-way coverage of a program's input space—i.e., to exercise all possible choices of concrete values for every combination of t input parameters. For example, suppose a program p takes Boolean parameters w, x, y, and z, and suppose we want to test that p behaves well for every choice of values for every pair of these four parameters. If we choose carefully, we can check all such choices—all 2-way interactions—with just five test cases:


You can check for yourself: for any two parameters, every combination of values for these parameters is covered by some test. For example, "w = False and x = False" is covered by #1, while both "w = True and x = True" and "w = True and y = True" are covered by #5. Any other test case we could come up with would check a redundant set of 2-way interactions. Thus, we get 100% pairwise coverage with just five out of the 2<sup>4</sup> = 16 possible inputs. This advantage improves exponentially with the number of parameters.

Why is this interesting? Because surveys of real-world systems have shown that bugs are often provoked by specific choices of just a few parameters [16]. Indeed, one study involving a distributed database at NASA found that, out of 100 known failures, 93 were caused by 2-way parameter interactions; the remaining 7 failures were each caused by no more than 6 parameters interacting together [14]. This suggests that combinatorial testing is an effective way to choose test cases for real systems.

If combinatorial coverage can be used to concentrate bug-finding power into small sets of tests, it is natural to wonder whether it could also be used to thin the distribution of a random generator. So far, combinatorial testing has mostly been applied in settings where the input to a program is just a vector of parameters, each drawn from a small finite set. Could we take it further? In particular, could we transfer ideas from combinatorial testing to the richer setting addressed by QuickCheck—i.e., functional programs whose inputs are drawn from structured, potentially infinite data types like lists and trees?

Our first contribution is showing how to generalize the definition of combinatorial coverage to work with regular tree expressions, which themselves generalize the algebraic data types found in most functional languages. Instead of covering combinations of parameter choices, we measure coverage of test descriptions—concise representations of sets of tests, encoding potentially interesting interactions between data constructors. For example, the test description cons(true, /false) describes the set of Boolean lists that have true as their first element, followed by at least one false somewhere in the tail.

Our second contribution is a method for enhancing property-based testing using combinatorial coverage. We propose an algorithm that uses combinatorial coverage information to thin an existing random generator, leading it to more interesting test suites that find bugs more often. A concrete realization of this algorithm in a tool called QuickCover was able, in our experiments, to guide random generation to find bugs using an average of 10× fewer tests than QuickCheck. While generating test suites is (considerably) slower, running the tests can be much faster. As such, QuickCover excels in settings where tests are particularly costly to run, as well as in situations like continuous-integration, when the cost of test generation is amortized over many runs of the test suite.

In summary, we offer these contributions:


We conclude with an overview of related work (Section 7), and ideas for future work (Section 8).

# **2 Classical Combinatorial Testing**

To set the stage, we begin with a brief review of "classical" combinatorial testing.

Combinatorial testing measures the "combinatorial coverage" of test suites, aiming to find more bugs with fewer tests. Standard presentations [16] are phrased in terms of a number of separate input parameters. Here, for notational consistency with the rest of the paper, we will instead assume that a program takes a single input consisting of a tuple of values.

Assume we are given some finite set C of constructors, and consider the set of n-tuples over C:

$$\{\mathsf{tuple}\_n(C\_1, \ldots, C\_n) \mid C\_1, \ldots, C\_n \in \mathcal{C}\}$$

(The "constructor" tuple<sup>k</sup> is not strictly needed in this section, but it makes the generalization to constructor trees and tree regular expressions in Section 3 smoother.) We can use these tuples to represent test inputs to systems. For example a web application might be tested under configurations

tuple4(Safari, MySQL, Admin, English)

in order to verify some end-to-end property of the system.

A specification of a set of tuples is written informally using notation like:

```
tuple4(Safari+Chrome, Postgres+MySQL, Admin+User, French+English)
```
This specification restricts the set of valid tests to those that have valid browsers in the first position, valid databases in the second, and so on. Specifications are thus a lot like types—they pick out a set of valid tests from some larger set. We define this notation precisely in Section 3.

To define combinatorial coverage, we introduce the notion of partial tuples i.e., tuples where some elements are left indeterminate (written ). For example:

tuple4(Chrome, , Admin, ).

A description is compatible with a specification if its concrete (non-) constructors are valid in the positions where they appear. Thus, the description above is compatible with our web-app configuration specification, while this one is not:

tuple4(MySQL, MySQL, French, )

We say a test covers a description—which, conversely, describes the test when the tuple matches the description in every position that does not contain . For example, the description

```
tuple4(Chrome, , Admin, )
```
describes these tests:

tuple4(Chrome, MySQL, Admin, English) tuple4(Chrome, MySQL, Admin, French) tuple4(Chrome, Postgres, Admin, English) tuple4(Chrome, Postgres, Admin, French)

Finally, we call a description t-way if it fixes exactly t constructors, leaving the rest as .

Now, suppose a system under test takes a tuple of configuration values as input. Given some correctness property (e.g., the system does not crash), a test for the system is simply a particular tuple, while a test suite is a set of tuples. We can then define combinatorial coverage as follows:

**Definition 1.** The t-way combinatorial coverage of a test suite is the proportion of t-way descriptions, compatible with a given specification, that are covered by some test in the suite.

We say that t is the strength of the coverage.

A test suite with 100% 2-way coverage for the present example can be quite small. For example,

> tuple4(Chrome, Postgres, Admin, English) tuple4(Chrome, MySQL, User, French) tuple4(Safari, Postgres, User, French) tuple4(Safari, MySQL, Admin, French) tuple4(Safari, MySQL, User, English)

achieves 100% coverage with just five tests. The fact that a single test covers many different descriptions is what makes combinatorial testing work: while the number of descriptions that must be covered is combinatorially large, a single test can cover combinatorially many descriptions. In general, for a tuple of size n, the number of descriptions is given by <sup>n</sup> t ways to choose t parameters multiplied by the number of distinct values each parameter can take on.

# **3 Generalizing Coverage**

Of course, inputs to programs are often more complex than just tuples of enumerated values, especially in the world of functional programming. To apply the ideas of combinatorial coverage in this richer world, we generalize tuples to constructor trees and tuple specifications to regular tree expressions. We can then give a generalized definition of test descriptions that makes sense for algebraic data types, setting up for a more powerful definition of combinatorial coverage.

A ranked alphabet Σ is a finite set of atomic data constructors, each with a specified arity. For example, the ranked alphabet

<sup>Σ</sup>list(bool) {(cons, 2), (nil, 0), (true, 0), (false, 0)}

defines the constructors needed to represent lists of Booleans. Given a ranked alphabet Σ, the set of trees over Σ is the least set T<sup>Σ</sup> that satisfies the equation

$$\mathcal{T}\_{\Sigma} = \{ C(t\_1, \dots, t\_n) \mid (C, \ n) \in \Sigma \land t\_1, \dots, t\_n \in \mathcal{T}\_{\Sigma} \}.$$

Regular tree expressions are a compact and powerful tool for specifying sets of trees [9, 10]. They are generated by the following syntax:

$$\begin{array}{l} e \triangleq \top \\ \mid e\_1 + e\_2 \\ \mid \mu X. \; e \\ \mid X \\ \mid C(e\_1, \ldots, e\_n) \text{ for } (C, \; n) \in \Sigma \end{array}$$

Each of these operations has an analog in standard regular expressions over strings: + corresponds to disjunction of regular expressions, μ corresponds to iteration, and the parent-child relationship corresponds to concatenation. These expressions give us a rich language for describing tree structures.

The denotation function -· mapping regular tree expressions to sets of trees is the least function satisfying the equations:

$$\begin{aligned} \left[ \top \right] &= \mathcal{T}\_{\Sigma} \\ \left[ C(e\_1, \dots, e\_n) \right] &= \{ C(t\_1, \dots, \dots, t\_n) \mid t\_i \in \left[ e\_i \right] \} \\ \left[ e\_1 + e\_2 \right] &= \left[ e\_1 \right] \cup \left[ e\_2 \right] \\ \left[ \mu X. \ e \right] &= \left[ e \middle| \mu X. \ e \right] \textbf{X} \end{aligned}$$

Regular tree expressions subsume standard first-order algebraic data type definitions. For example, the Haskell definition

$$\textbf{data\\_BoolList} = \textbf{Cons\\_Bool\\_BoolList} \mid \textbf{Mi1}$$

is equivalent to the regular tree expression

$$\mu X. \text{ совs} (\text{true} + \text{false}, \ X) + \text{nil}.$$

Crucially for our purposes, regular tree expressions can also be used to define sets of trees that cannot be described with plain ADTs. For example, the expression

cons(true + false, nil)

denotes all single-element Boolean lists, while

$$\mu X. \text{ совs} (\text{true}, \ X) + \text{nil}$$

describes the set of lists that only contain true. Regular tree expressions can even express constraints like "true appears at some point in the list":

$$\mu X. \; \mathsf{cons}(\top,\,\,X) + \mathsf{cons}(\mathsf{true},\,\,\mu Y. \; \mathsf{cons}(\top,\,\,Y) + \mathsf{nil}).$$

This machinery smoothly generalizes the structures we saw in Section 3. Tuples are just a special form of trees, while specifications and test descriptions can be written as regular tree expressions. This gives us most of what we need to define algebraic data types.

Recall the definition of t-way combinatorial coverage: "the proportion of (1) t-way descriptions, (2) compatible with a given specification, that (3) are covered by some test in the suite." What does this mean in the context of regular tree expressions and trees?

Condition (3) is easy: a test (i.e., a tree) t covers a test description (a regular tree expression) <sup>d</sup> if <sup>t</sup> <sup>∈</sup> d.

For (2), consider some regular tree expression τ representing an algebraic data type that we would like to cover. We say that a description d is compatible with <sup>τ</sup> if <sup>τ</sup> <sup>∩</sup> <sup>d</sup> <sup>=</sup> <sup>∅</sup>. As with string regular expressions, this can be checked efficiently.

The only remaining question is (1): which set of t-way descriptions to use. We argue in the next section that the set of all regular tree expressions is too broad, and we offer a simple and natural alternative.

# **4 Sparse Test Descriptions**

A na¨ıve way to generalize the definition of t-way descriptions to regular tree expressions would be to first define the size of a regular tree expression as the number of operators (constructors, +, or μ) in it and then define a t-way description to be any regular tree expression of size t. However, this approach does not specialize nicely to the classical case; for example the description

tuple4(Safari + Chrome, , , )

would be counted as "4-way" (3 constructors and 1 "+" operator), even though it is covered by every well-formed test. Worse, "interesting" descriptions are often quite large. For example, the smallest possible description of lists in which true is followed by false,

μX. cons(, X)+cons(true, μY. cons(, Y )+cons(false, μZ. cons(, Z)+nil))

has size t = 14. We want a representation that packs as much information as possible into small descriptions, making t-way coverage meaningful for small values of t and increasing the complexity of the interactions captured by our definition of coverage.

In sum, we want a definition of coverage that straightforwardly specializes to the tuples-of-constructors case and that captures interesting structure with small descriptions.

Our proposed solution, described next, takes inspiration from temporal logic. We first encode an "eventually" (/) operator that allows us to write the expression from above much more compactly as /cons(true, /false). This can be read as "somewhere in the tree, there is a cons node with a true node to its left and a false node somewhere in the tree to its right." Then we define a restricted form of sparse test descriptions using just /, , and constructors.

### **4.1 Encoding "Eventually"**

The "eventually" operator can actually be encoded using the regular tree expression operators we have already defined—i.e., we can add it without adding any formal power. First, define the set of templates for the ranked alphabet Σ:

$$\mathbb{T} \triangleq \{ C(\top\_1, \dots, \top\_{i-1}, \ [], \ \top\_{i+1}, \dots, \ \top\_n) \mid (C, \ n) \in \Sigma, \ 1 \le i \le n \} $$

That is, for each constructor C in Σ, the set of templates T contains C([ ], , ..., ), C(, [ ], , ..., ), etc., all the way to C(, ..., , [ ]), enumerating every way to place one hole in the constructor and fill every other argument slot with . (Nullary constructors are ignored.) Then we define "next" (◦e) and "eventually" (/e) as

$$\lozenge e \stackrel{\Delta}{=} \sum\_{T \in \mathbb{T}} T[e]$$

$$\lozenge e \stackrel{\Delta}{=} \mu X. \; e + \circ X$$

where T[e] is the replacement of [ ] in T with e. <sup>3</sup> Intuitively, ◦<sup>e</sup> describes any tree C(t1, ..., tn) in which e describes some direct child (i.e., t1, t2, and so on), while /e describes anything described by e, plus (unrolling the μ) anything described by ◦e, ◦◦e, and so on.

This is not the only way to design a compact, expressive subset of regular tree expressions, but our evaluation shows that this has useful properties. In addition, the / notation gives an elegant way to write descriptions like the one from the previous section (/cons(true, /false),), neatly capturing "somewhere in the tree" constraints that would require many more symbols in the bare language of regular tree expressions.

### **4.2 Defining Coverage**

Even in the language with just /, , and constructors, there is still a fair amount of freedom in how we define the set of t-way descriptions. In this section we present one possibility that we have found to be useful in practice; in Section 8 we discuss another interesting option.

The set of sparse test descriptions for a given Σ is the trees generated by

$$\begin{aligned} d & \triangleq \top \\ & \mid \Diamond C(d\_1, \dots, d\_n) \text{ for } (C, \; n) \in \Sigma, \end{aligned}$$

that is, trees consisting of constructors prefixed by / and . We call these descriptions "sparse" because they match specific ancestor-descendant arrangements of

<sup>3</sup> This construction is why we choose to deal with finite ranked alphabets: if Σ were infinite, <sup>T</sup> would be infinite, and ◦<sup>e</sup> would be an infinite term that is not expressible as a standard regular tree expression.

constructors but place no restriction on the constructors in between, due to the "eventually" before each constructor.

Sparse test descriptions are designed to be compact, useful in practice, and compatible with the classical definition of coverage. For that reason we aim to keep them as information-dense as possible. First, we do not include the μ operator directly, instead relying on /: indeed, / captures a pattern of recursion that is general enough to express interesting non-local constraints while keeping description complexity low. Similarly, we do not need to include the + operator: any test that covers any test that covers either C(d1, ..., dn) or D(d1, ..., dm) will also necessarily cover C(d1, ..., dn) + D(d1, ..., dm).

Removing explicit uses of μ and + does limit the expressive power of sparse test descriptions a little—for example it rules out complex mutually recursive definitions. However, we do not intend to use descriptions to specify entire languages, only fragments of languages that we hope to cover with testing. Naturally, there are many other possible formats for test descriptions that would be interesting to explore—we leave that for future work. In this paper, we chose to make descriptions very compact while preserving most of their expressive power, and the case studies in Section 6 demonstrate that such a choice works well in at least two challenging domains that are relevant to programming languages as a whole.

Finally, we define the size of a description based on the number of constructors it contains. Intuitively, a t-way description is one with t constructors; however, in order to be consistent with the classical definition, we omit constructors whose types permit no alternatives. For example, all of the tuple constructors (e.g. tuple<sup>4</sup> in our running example) are left out of the size calculation. This makes t-way sparse test description coverage specialize to exactly classical tway parameter interaction coverage for the case of tuples of sums of nullary constructors.

Sparse descriptions work as expected for types like

$$\text{tuple}\_4(\mathsf{Safari} + \mathsf{Chromo}, \, \mathsf{Postgres} + \mathsf{MySQL}, \, \mathsf{Admin} + \mathsf{Uscr}, \, \mathsf{French} + \mathsf{English}).$$

Despite some stray occurrences of /, as in

$$\diamondsuit \mathtt{tuple}\_4(\diamondsuit \mathtt{Cromo}, \diamondsuit \mathtt{MySQL}, \top, \top),$$

the descriptions still describe the same sets of tests as the standard tuple descriptions without the uses of /. Thus, our new definition of combinatorial coverage generalizes the classical one.

These descriptions capture a rich set of test constraints in a compact form. The real proof of this is in our evaluation results—see Section 6 for those—but a few more examples may help illustrate.

Boolean Lists As a first example, consider the type of Boolean lists:

τlist(bool) μX. cons(true + false, X) + nil.

The set of all 2-way descriptions that are compatible with τlist(bool) is:

/cons(/true, ) /cons(/false, ) /cons(, /nil) /cons(, /cons(, )) /cons(, /true) /cons(, /false)

Unpacking the notation, /cons(/true, ) describes the set of trees where "at some point in the tree there is a cons node with a true node somewhere in its left child."

Arithmetic Expressions Consider the type of simple arithmetic expressions over the constants 0, 1, and 2:

τexpr μX. add(X, X) + mul(X, X) + 0 + 1 + 2.

This type has 2-way descriptions like

$$\lozenge\mathtt{add}(\lozenge\mathtt{mult}(\top,\top),\top) \text{ and } \lozenge\mathtt{mult}(\top,\lozenge\mathtt{add}(\top,\top)),$$

which capture different nestings of addition and multiplication.

System F For a more involved example, let's look at some 2-way sparse descriptions for a much more complex data structure: terms of the polymorphic lambda calculus, System F.

$$\begin{aligned} \tau & \triangleq \mathcal{U} \mid \tau\_1 \to \tau\_2 \mid n \mid \forall . \tau \\ e & \triangleq \left( \right) \mid n \mid \lambda \tau. \, e \mid \left( e\_1 \, e\_2 \right) \mid A. \, e \mid \left( e \, \tau \right) \end{aligned}$$

(We use de Bruijn indices for variable binding, meaning that each variable occurrence in the syntax tree is represented by a natural number indicating which enclosing abstraction it was bound by.)

System F syntax can be represented using a regular tree expression like

$$\mathsf{a}\mathsf{x}X.\mathsf{ unit} + \mathsf{var}(\mathsf{VAR}) + \mathsf{abs}(\mathsf{T}\mathsf{YPE},\,\,X) + \mathsf{app}(X,\,\,X) + \mathsf{tabs}(X) + \mathsf{tapp}(X,\,\,\mathsf{T}\mathsf{YPE}),$$

where Type is defined in a similar way and Var represents natural-number de Bruijn indices.

This already admits useful 2-way descriptions like

$$\lozenge\mathtt{app}(\lozenge\mathtt{abs}(\top,\top),\top) \text{ and } \lozenge\mathtt{app}(\lozenge\mathtt{app}(\top,\top),\top),$$

which capture relationships between lambda abstractions and applications. In Section 6.1, we use descriptions like these to find bugs in an evaluator for System F expressions; they ensure that our test suite adequately covers different nestings of abstractions and applications that might provoke bugs.

With a little domain-specific knowledge, we can make the descriptions capture even more. When setting up our case study in Section 6.2, which searches for bugs in GHC's strictness analyzer, we found that it was often useful to track coverage of the seq function, which takes two functions as arguments, executes the first for any side-effects (e.g., exceptions), and then executes the second. Modifying our regular expression type to include seq as a first-class constructor results in 2-way descriptions now include interactions like

$$\lozenge \mathsf{seq}(\lozenge \mathsf{app}(\top,\top),\top)$$

that encode interactions of seq with other System F constructors. These interactions are crucial for finding bugs in a strictness analyzer, since seq gives fine-grained control over the evaluation order within a Haskell expression.

# **5 Thinning Generators with QuickCover**

Having generalized the definition of combinatorial coverage to structured data types, the next step is to explore ways of using coverage to improve propertybased testing.

When we first approached this problem, we planned to follow the conventional combinatorial testing methodology of generating covering arrays [38], i.e., test suites with 100% t-way coverage for a given t. Rather than use an unbounded stream of random tests—the standard methodology in property-based testing we would test properties using just the tests in some pre-generated covering array. However, we encountered two major problems with this approach. First, as t grows, covering arrays become frighteningly expensive to generate. While there are efficient methods for generating covering arrays in special cases like 2-way coverage [8], general algorithms for generating compact covering arrays are complex and often slow [23]. Second, we found that covering arrays for sets of test descriptors in the format described above did not do particularly well at finding bugs! In a series of preliminary experiments with one of our case studies, we found that with 4-way coverage (the highest we could generate in reasonable time), our covering arrays did not reliably catch all of the bugs in our test system. Fortunately, after some more head scratching and experimenting, we discovered an alternate approach that works quite well. The trick is to embrace the randomness that makes property-based testing so effective.

In the remainder of this section, we first present an algorithm that uses combinatorial coverage to "thin" a random generator, guiding it to more interesting inputs. Rather than generating a fixed set of tests in the style of covering arrays, this approach produces an unbounded stream of interesting test inputs. Then we discuss some concrete details behind QuickCover, the Haskell implementation of our algorithm that we used to obtain the experimental results in Section 6.

### **5.1 Online Generator Thinning**

The core of our algorithm is QuickCheck's standard generate-and-test loop. Given a test generator gen and a property p, QuickCheck generates inputs repeatedly until either (1) the property fails, or (2) a time limit is reached.

```
QuickCheck(gen , p):
  repeat LIMIT times:
    # Generate 1 new input
    x = gen()
    # Check the property
    if !p(x), return False
  return True
```
LIMIT is chosen based on the user's specific testing budget, and it can vary significantly in practice. In the experiments below, we know a priori that a bug exists in the program, so we set LIMIT to infinity and just run tests until the property fails.

Our algorithm modifies this basic one to use combinatorial coverage information when choosing the next test to run.

```
QuickCover(strength , fanout , gen , p):
  coverage = initCoverage()
  repeat LIMIT times:
    # Generate fanout potential inputs
    xs = listOf(gen(), fanout)
    # Find the input with the best improved coverage
    x = argmax[x in xs](
      coverageImprovement(x, coverage , strength) )
    # Check the property
    if !p(x), return False
    # Update the coverage information
    coverage = updateCoverage(x, coverage , strength)
  return True
```
The key idea is that, instead of generating a single input at each iteration, we generate several (controlled by the parameter fanout) and select the one that increases combinatorial coverage the most. We test the property on that input and, if it does not fail, update the coverage information based on the test we ran and keep going.

This algorithm is generic with respect to the representation for coverage information, but the particular choice of data structure and interpretation makes a significant difference in both efficiency and effectiveness. In our implementation, coverage information is represented by a multi-set of descriptions:

```
initCoverage():
  return emptyMultiset()
coverageImprovement(x, coverage , strength):
  ds = descriptions(x, strength)
  return sum([ 1 / (count(d, coverage) + 1)
               for d in ds ])
updateCoverage(x, coverage , strength):
  return union(descriptions(x, strength), coverage)
```
At the beginning, the multi-set is empty; as testing progresses, each test is evaluated based on coverageImprovement. If a description d had previously been covered n times, it contributes <sup>1</sup> <sup>n</sup>+1 to the score. For example, if a test input covers d<sup>1</sup> and d2, where previously d<sup>1</sup> was not covered and d<sup>2</sup> was covered 3 times, the total score for the test input would be 1 + 0.25 = 1.25.

At first glance, one might think of a simpler approach based on sets instead of multi-sets. Indeed, this was the first thing we tried, but it turned out to perform substantially worse than the multiset-based one in our experiments. The reason is that just covering each description once turns out not to be sufficient to find all bugs, and, once most descriptions have been covered, this approach essentially degenerates to normal random testing. By contrast, the multi-set representation continues to be useful over time; after each description has been covered once, the algorithm begins to favor inputs that cover descriptions a second time, then a third time, and so on. This allows QuickCover to generate arbitrarily large test suites that continue to benefit from combinatorial coverage.

Keeping track of coverage information like this does create some overhead.<sup>4</sup> For each test that QuickCover considers (including those that are never run), it needs to analyze which descriptions the test covers and check those against the current multi-set. This overhead means that QuickCover is often much slower than QuickCheck with respect to to generating tests. In the next section, we explore use cases for QuickCover that overcome this overhead by running fewer tests.

# **6 Evaluation**

Since QuickCover adds some overhead to generating tests, one might expect that it will be particularly well suited to situations where each test may be run many times. The primary goal of our experimental evaluation was to test this hypothesis.

<sup>4</sup> The overhead introduced is highly variable and based largely on the exact implementation of the underlying test generator. Appendix A goes into slightly more detail on the asymptotics, but broadly speaking the time it QuickCover to generate a test is linear in the fan-out and exponential in the coverage strength.

Of course, running the same test repeatedly on the same code is pointless: if it were ever going to fail, it would do so on the first run (ignoring the thorny possibility of "flaky tests" due to nondeterminism [25]). However, running the same test on successive versions of the code is not only useful; it is standard practice in two common settings: regression testing, i.e., checking that code is still working after changes, and especially continuous integration, where regression tests are run automatically every time a developer checks in a new version of the code. In these settings, the overhead introduced by generating many tests and discarding some without running them can be amortized, since the same tests may be reused very many times, so that the cost of generating the test suite becomes less important than the cost of running it.

In order to validate this theory, we designed two experiments using Quick-Cover. The primary goal of these experiments was to answer the question: Does QuickCover actually reduce the number of tests needed to find bugs in a real system?

Both case studies answer this question in the affirmative. The first case study, in particular, demonstrates a situation where QuickCover needs an average 10× fewer tests to find bugs, compared to pure random testing. We choose an evaluator for System F terms as our example because it allows us to test how Quick-Cover behaves in a small but realistic scenario that requires a fairly complex random testing setup. Our second case study expands on results from Palka et al. [32], scaling up and applying QuickCover to find bugs in the Glasgow Haskell Compiler (GHC) [27].

A secondary goal of our evaluation was to understand whether the generator thinning overhead is always too high to make QuickCover useful for real-time property-based testing, or if there are any cases where using QuickCover would yield a wall-clock improvement even if tests are only run once. Our second case study answers this question in the affirmative.

### **6.1 Case Study: Normalization Bugs in System F**

Our first case study uses combinatorial coverage to thin a highly tuned and optimized test generator for System F [12, 35] terms. The generator produces well-typed System F terms by construction (no mean feat on its own) and is tuned to produce a highly varied distribution of different terms. Despite all the care put into the base generator, we found that modifying the test distribution using QuickCover results in a test suite that finds bugs with many fewer inputs.

Generating "interesting" programs (for finding compiler bugs, for example) is an active research area. For instance, a generator for well-typed simply typed lambda-terms has been used to reveal bugs in GHC [6, 20, 32], while a generator for C programs that avoid "undefined behaviors" has been used to find many bugs in production compilers [24, 34, 41] The cited studies are all examples of differential testing, where different compilers (or different versions of the same compiler) were run against each other on the same inputs to reveal discrepancies. Similarly, for the present case study we tested different evaluation strategies for System F, comparing the behavior of various buggy versions to a reference implementation.

Recall the definition of System F from Section 4.2. Let e[v/n] stand for substituting v for variable n in e, and e ↑<sup>n</sup> for "lifting"—incrementing the indices of all variables above n in e. Then, for example, the standard rule for substituting a type τ for variable n inside a type abstraction Λ. e requires lifting τ and incrementing the de Bruijn index of the variable being substituted by one:

$$(A.e)[\tau/n] = A.e[\tau \uparrow\_0 \;/n + 1]$$

Here are two ways to get this wrong: forget to lift the variables, or forget to increment the index. Those bugs would lead to the following erroneous definitions (the missing operation is shown in red):

$$(\Lambda.e)[\tau/n] = \Lambda.e[\tau \uparrow\_0 \;/n + 1] \quad \text{and} \quad (\Lambda.e)[\tau/n] = \Lambda.e[\tau \uparrow\_0 \;/n + 1].$$

Inspired by errors like these (specifically in the substitution and variable lifting functions), we inserted bugs by hand to create 19 "mutated" versions of two different evaluation relations. (The bugs are described in detail in Appendix C.) The two evaluation relations simplify terms in slightly different ways: the first implements standard big-step evaluation (eval), and the second uses a parallel evaluation relation to fully normalize terms (peval). (We chose to check both evaluation orders, since some mutations only cause a bug in one implementation or the other.) Since we were interested in bugs in either evaluation order, we tested a joint property:

### eval e == eval mutated e && peval e == peval mutated e

Starting with a highly tuned generator for System F terms as our baseline, we used both QuickCheck and QuickCover to generate a stream of test values for e and measured the average number of tests required to find a bug (i.e., Mean-Tests-To-Failure, or MTTF) for each approach.

Surprisingly, we found little or no difference in MTTF between 2-way, 3-way, and 4-way testing, but changing the fan-out did make a large impact. Figure 1 shows both absolute MTTF for various choices of fan-out (log<sup>10</sup> scale) and the performance improvement as a ratio of un-thinned MTTF to thinned MTTF. All choices of fan-out produced better MTTF results than the baseline, but higher values of fan-out tended to be more effective on average. In our best experiment, a fan-out of 30 found a bug in an average of 15× fewer tests than the baseline; the overall average was about 10× better. Figure 2 shows the total MTTF improvement across 19 bugs, compared to the maximum theoretical improvement. If our algorithm were able to perfectly pick the best test input every time, the improvement would be proportional to the fan-out (i.e., it is impossible for our algorithm be more than 10× better with a fan-out of 10). On the other hand, if combinatorial coverage were irrelevant to test failure, then we would expect the QuickCover test suites to have the same MTTF as QuickCheck. It is clear from the figure that QuickCover is really quite effective in this setting: for small

**Fig. 1.** Top: System F MTTF, log<sup>10</sup> scale, plotted in order of MTTF for un-thinned random tests, t = 2. Bottom: System F MTTF ratio of MTTF for un-thinned random tests to MTTF for QuickCover, t = 2.

fan-outs, it is very close to the theoretical optimum, and with a fan-out of 30 it achieves about <sup>1</sup> <sup>3</sup> of the potential improvement—that is, three QuickCover test cases are more likely to provoke a bug than thirty QuickCheck ones.

### **6.2 Case Study: Strictness Analysis Bugs in GHC**

To evaluate how our approach scales, and to investigate whether QuickCover can be used not only to reduce the number of tests required but also to speed up bugfinding, we replicated the case study of Palka et al. [32], which found bugs in the

**Fig. 2.** System F, proportional reduction in total number of tests needed to find all bugs.

strictness analyzer of GHC 6.12 using a hand-crafted generator for well-typed lambda terms; we replicated their experimental setup, but used QuickCover to thin their generator and produce better tests.

Two attributes of this case study make it an excellent test of the capabilities of our combinatorial thinning approach. First, it found bugs in a real compiler by generating random well-typed lambda terms, and therefore we can evaluate whether the reduction in number of tests observed in the System F case study scales to a production setting. Second, running a test involves invoking the GHC compiler, a heavyweight external process. As a result, reducing the number of tests required to provoke a failure should (and does) lead to an observable improvement in terms of wall-clock performance.

Concretely, Palka et al. generate a list of functions that manipulate lists of integers and compare the behavior of these functions on partial lists (lists with undefined elements or tails) when compiled with and without optimizations, another example of differential testing. They uncover errors in the strictness analyzer component of GHC's optimizer that lead to inconsistencies where the un-optimized version of the compiled code correctly fails with an error while the optimized version prints something to the screen before failing:


Finally, to balance the costly compiler invocation with the similarly costly smart generation process, Palka et al. group 1000 generated functions together in a single module to be compiled; this number was chosen to strike a precise 50-50 balance between generation time and compilation/execution time for each generated module. Since our thinning approach itself introduces approximately a 25% overhead in generation time, we increased the number of tests per module to 1250 to maintain the same balance and make a fair comparison.

We ran our experiments in a Virtual Box running Ubuntu 12.04 (a version old enough to allow for executing GHC 6.12.1), with 4GB RAM in a host machine running i7-8700 @ 3.2GHz. We performed 100 runs of the original case study and 100 runs of our variant that adds combinatorial thinning, using a fan-out of 2 and a strength of 2. We found that our approach reduces the mean number of tests required from 21268 ± 1349 to 14895 ± 1056, a 42% improvement, and reduces the mean time to failure from 193 ± 13 seconds to 149 ± 12, a 30% improvement.

# **7 Related Work**

A detailed survey of the (vast) combinatorial testing literature can be found in [30]. Here we discuss just the most closely related work, in particular, other attempts to generalize combinatorial testing to structured and infinite domains. We also discuss other approaches to property based testing with similar goals to to ours, such as adaptive random testing and coverage-guided fuzzing.

### **7.1 Generalizations of Combinatorial Testing**

Salecker and Glesner [37] extend combinatorial testing to sets of terms generated by a context-free grammar. Their approach cleverly maps context-free grammar derivations up to some depth k to sets of parameter choices; then it uses standard full-coverage test suite generation algorithms to pick a subset of derivations to test. The main limitation of this approach is the parameter k. By limiting the derivation depth, this approach only defines coverage over a finite subset of the input type. By contrast, our definition of coverage works over infinite types by exploiting the recursive nature of the / operator. We focus on description size rather than term size, which provides more flexibility for "packing" multiple descriptions into a single test.

Another approach to combinatorial testing of context-free inputs is due to L¨ammel and Schulte [19]. Their system also uses a depth bound, but it provides the user finer-grained control. At each node in the grammar, the user is free to limit the coverage requirements and prune unnecessary tests. This is an elegant solution for situations where the desired interactions are known a priori. Unfortunately, this approach needs to be re-tuned manually to every specific type and use-case, so it is not the general solution we were after.

Finally, Kuhn et al. [15] present a notion of sequence covering arrays to describe combinatorial coverage of sequences of events. We believe that t-way sequence covering arrays in their system are equivalent to (2t−1)-way full-coverage test suites of the appropriate list type in ours. They also have a reasonably efficient algorithm for generating covering arrays in this specialized case.

Our idea to use regular tree expressions for coverage is partly inspired by Usaola et al. [40] and Mariani et al. [26]. Rather than generate a set of terms to cover an ADT, these works generate strings to cover (i.e. match in every possible way) a particular regular expression. This turns out to be quite a different problem, but these explorations led us to consider coverage in context of of formal languages.

### **7.2 Comparison with Enumerative Property-Based Testing**

Another approach to property-based testing research is based on enumeration of small test cases, rather than random generation. Tools like SmallCheck [36] offer guarantees that there is no counterexample smaller than a certain limit, and moreover always report the smallest counterexample when it exists. To compare our approach with this type of tool, we repeated our System F evaluation with a variety of enumerative testing tools.

We first tried SmallCheck, which enumerates all test cases up to a given depth. Unfortunately, the number of System F terms rises very rapidly with the depth: SmallCheck quickly enumerated 708 terms of depth up to three, but could not enumerate all terms of depth four within 20 minutes of CPU time.<sup>5</sup> Only one of the 19 bugs we planted was provoked by any of those 708 terms.

However, SmallCheck wastes effort generating syntactically correct terms that are not type correct; only 140 of the 708 were well-typed. Lazy Small-Check [36] exploits laziness in property preconditions to discard many test cases in a group—in this case, all those terms that fail a type-check in the same way are discarded together. Because well-typedness is such a strong precondition, Lazy SmallCheck is able to dramatically reduce the number of terms needed at each depth, enabling us to increase the depth limit to 4, and generate over five million terms. The result was a much more comprehensive test suite than normal SmallCheck, but it still only found 8 out of our 19 bugs.

The problem here is that the smallest counterexamples we are searching for are quite small terms, but may nevertheless have a few fairly deep nodes in their syntax trees. More recent enumerative tools, such as LeanCheck [3], enumerate test cases in size order, instead of in depth order, thus reaching terms with just a few deeper nodes much earlier in the enumeration. For this example, LeanCheck runs out of memory after about 11 million tests. but this was enough to find all but four of the planted bugs.

However, LeanCheck does not use the Lazy SmallCheck optimization, and so is mostly testing ill-typed terms, for which our property holds vacuously. SciFe [18] enumerates in size order and uses the Lazy SmallCheck optimization, with good results. It is hard to apply SciFe, which is designed to test Scala, to our Haskell code, so instead we created a Lazy SmallCheck variant that enumerates in size order. With this variant, we could find all of the planted bugs, with counterexample sizes varying from 5 to 14. Lazy SmallCheck does not report the number of tests needed to find a counterexample, just the size at which it was found, together with the number of test cases of each size. We can therefore only

<sup>5</sup> Compiled with ghc -O2, on an Intel i7-6700k with 32GB of RAM under Windows 10.

give a lower bound for the number of tests needed to find each bug. Figure 3 plots this lower bound against the average number of tests needed by QuickCheck and by QuickCover. For these bugs, it is clear that the enumerative approach is not competitive with QuickCheck, let alone with QuickCover. The improvement in the numbers of tests needed ranges from 1.7 to 5.5 orders of magnitude, with a mean across all the bugs of 3.3 orders of magnitude.

**Fig. 3.** System F MTTF for QuickCheck and QuickCover, and the lower bound on the number of tests run by our Lazy SmallCheck variant, log<sup>10</sup> scale.

### **7.3 Comparison with Fuzzing Techniques**

Coverage-guided fuzzing tools like AFL [22] can be viewed as a way of using a different form of feedback (branch instead of combinatorial coverage) to improve the generation of random inputs by finding more "interesting" tests. Fuzzing is a huge topic [43] that has exploded in popularity recently, with researchers evaluating the benefits of using more forms of feedback [13, 31], incorporating learning [28,33] or symbolic [39,42] techniques, and bringing the benefits of these methods to functional programming [11, 21]. One fundamental difference, however, is that all of these techniques are online and grey-box: they instrument and execute the program on various inputs in order to obtain feedback. In contrast, combinatorial coverage can be computed without any knowledge of the code itself, therefore providing a convenient black-box alternative that can be valuable when the same test suite is to be used for many versions of the code (such as

in regression testing) or when executing the code is costly (such as when testing production compilers).

Chen et al.'s adaptive random testing (ART) [4] uses an algorithm that, like QuickCover's, generates a set of random tests and selects the most interesting to run. Rather than using combinatorial coverage, ART requires a distance metric on test cases—at each step, the candidate which is farthest from the already-run tests is selected. Chen et al. show that this approach finds bugs after fewer tests, on average, in the programs they study. ART was first proposed for programs with numerical inputs, but Ciupa et al. [5] showed how to define a suitable metric on objects in an object-oriented language and used it to obtain a reduction of up to two orders of magnitude in the number of tests needed to find a bug. Like combinatorial testing, ART is a black-box approach that depends only on the test cases themselves, not on the code under test.

However, Arcuri and Briand [1] question ART's value in practice, because of the quadratic number of distance computations it requires, from each new test to every previously executed test; in a large empirical study, they found that the cost of these computations made ART uncompetitive with ordinary random testing. While our approach also has significant computational overhead, the time and space complexity grow with the number of possible descriptions (derived from the data type definition and the choice of strength) and not with the total number of tests run—i.e., testing will not slow down over time. In addition, our approach works in situations where a distance metric between inputs does not make sense.

# **8 Conclusion and Future Work**

We have presented a generalized definition of combinatorial coverage and an effective way to use that definition for property-based testing, generalizing the definition of combinatorial coverage to work in the realm of algebraic data types with the help of regular tree expressions. Our sparse test descriptions provide a robust way to look at combinatorial testing, which specializes to the classical approach. We use these sparse descriptions as a basis for QuickCover—a tool that thins a random generator to increase combinatorial coverage. Two case studies show that QuickCover is useful in practice, finding bugs using an average of 10× fewer tests.

The rest of this section sketches a number of potential directions for further research.

### **8.1 Variations**

Our experiments show that sparse test descriptions are a good way to define combinatorial coverage for algebraic data types, but they are certainly not the the only way. Here we discuss some variations and why they might be interesting to explore.

Representative Samples of Large Types Perhaps it is possible to do combinatorial testing with ADTs by having humans decide exactly which trees to cover. This approach is already widely used in combinatorial testing to deal with types like machine integers that, though technically finite, are much too large for testing to efficiently cover all their "constructors." For example, if a human tester knows (by reading the code, or because they wrote it) that it contains an if-statement guarded by x<5, they might choose to cover

$$\mathbf{x} \in \{-2147483648, \ 0, \ 4, \ 5, \ 6, \ 2147483647\}.$$

The tester might choose values around 5 because those are important to the specific use case and boundary values and 0 to check for common edge-cases. Concretely, this practice means that instead of trying to cover tuple3(Int, true+ false, true + false), the tester covers the specification

tuple3(−2147483648 + 0 +4+5+6+ 2147483647, true + false, true + false).

In our setting, this might mean choosing a representative set of constructor trees to cover, and then treating them like a finite set. In much the same way as with integers, rather than cover

```
tuple3(τlist(bool), true + false, true + false),
```
we could treat a selection of lists as atomic constructors, and cover the specification

tuple3( [] + [true, false] + [false, false, false] , true + false, true + false)

which has 2-way descriptions like

tuple3( [] , , false) and tuple3( [true, false] , true, ).

Just as testers choose representative sets of integers, they could choose sets of trees that they think are interesting and only cover those trees. Of course, the set of all trees for a type is usually much larger and more complex than the set of integers, so this approach may not be as practical for structured types as for integers. Still, it is possible that small amounts of human intervention could help guide the choice of descriptions to cover.

Type-Tagged Constructors Another variation to our approach would change the way that ADTs are translated into constructor trees. In Appendix B we show a simple example of a Translation for lists of Booleans, but an interesting problem arises if we consider lists of lists of Booleans. The most basic approach would be to use the same constructors (LCNil and LCCons) for both "levels" of list. For example, [[True]] would become (with a small abuse of notation)

LCCons (LCCons LCTrue LCNil) LCNil.

Depending on the application, it might actually make more sense to use different constructors for the different list types ([Bool] vs. [[Bool]]). For example, [[True]] could instead be translated as

LCOuterCons (LCInnerCons LCTrue LCInnerNil) LCInnerNil

(with a slight abuse of notation), allowing for a broader range of potential test descriptions. This observation can be generalized to any polymorphic ADT: any time a single constructor is used at multiple types, it is likely beneficial to differentiate between them by translating to constructor tree nodes tagged with a monomorphized type.

Pattern Descriptions A third potential variation is a modification to make test descriptions a bit less sparse. Recall that sparse test descriptions are defined as

$$d \triangleq \top \mid \Diamond C(d\_1, \dots, d\_n).$$

What if we chose this instead?

$$\begin{aligned} d &\triangleq \Diamond d'\\ d' &\triangleq C(d'\_1, \dots, d'\_n) \end{aligned}$$

In the former case, every relationship is "eventual": there is never a requirement that a particular constructor appear directly beneath another. In the latter case, the descriptions enforce a direct parent-child relationship, and we simply allow the expression to match anywhere in the term. We might call this class "pattern" test descriptions.

We chose sparse descriptions for this work because putting / before every constructor leaves more opportunities for nodes matching different descriptions to be "interleaved" within a term, leading to smaller test suites in general. In some small experiments, this alternative proposal seemed to perform similarly across the board but worse in a few cases. Even so, experimenting with the use of eventually in descriptions might lead to interesting new ideas.

### **8.2 Combinatorial Coverage of More Types**

Our sparse tree description definition of combinatorial coverage is focused on inductive algebraic types. While these encompass a wide range of the types that functional programmers use, it is far from everything. One interesting extension would generalize descriptions to co-inductive types. We actually think that the current definition might almost suffice—regular tree expressions can denote infinite structures, so this generalization would likely only affect our algorithms and the implementation of QuickCover. We also should be able to include Generalized Algebraic Data Types (GADTs) without too much hassle. The biggest unknown is function types, which seem to require something more powerful than regular tree expressions to describe; indeed, it is not clear that combinatorial testing even makes sense for higher-order values.

### **8.3 Regular Tree Expressions for Directed Generation**

As we have shown, regular tree expressions are a powerful language for picking out subsets of types. In this paper, we mostly focused on automatically generating small descriptions, but it might be possible to apply this idea more broadly for specifying sets of tests. One straightforward extension would be to use the same machinery that we use for QuickCover but, instead of covering an automatically generated set of descriptions, ensure that, at a minimum, some manually specified set of expressions is covered. For example, we could use a modified version of our algorithm to generate a test set where

$$\mathsf{nil}, \mathsf{cons}(\top, \mathsf{nil}), \text{ and } \mu X. \mathsf{cons}(\mathsf{true}, \ X) + \mathsf{nil}.$$

are all covered. (Concretely, this would be a test suite containing, at a minimum, the empty list, a singleton list, and a list containing only true.) This might be useful for cases where the testers know a priori that certain shapes of inputs are important to test, but they still want to explore random inputs with those shapes.

A different approach would be to create a tool that synthesizes QuickCheck generators that only generate terms matching a particular regular tree expression. This idea, related to work on adapting branching processes to control test distributions [29], would make it easy to write highly customized generators and meticulously control the generated test suites.

### **Acknowledgments**

Thank you to Calvin Beck, Filip Niksic, and Irene Yoon for help with the early development of these ideas. Thanks to Kostis Sagonas, Alexandra Silva, and Andrew Hirsch for feedback along the way. This work was supported by NSF awards #1421243, Random Testing for Language Design and #1521523, Expeditions in Computing: The Science of Deep Specification, by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-18-C-0011, by the United States Air Force and DARPA under Contract No FA8750-16-C-0022 (any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of Vetenskapsr˚adet, the NSF, the United States Air Force, or DARPA), and by Vetenskapsr˚adet in Sweden for funding under the SyTeC project (grant number 2016-06204).

# **References**

1. Arcuri, A., Briand, L.C.: Adaptive random testing: an illusion of effectiveness? In: Dwyer, M.B., Tip, F. (eds.) Proceedings of the 20th International Symposium on Software Testing and Analysis, ISSTA 2011, Toronto, ON, Canada, July 17-21, 2011. pp. 265–275. ACM (2011). https://doi.org/10.1145/2001420.2001452, https: //doi.org/10.1145/2001420.2001452


October 1-2, 2004. Lecture Notes in Computer Science, vol. 3236, pp. 337– 350. Springer (2004). https://doi.org/10.1007/978-3-540-30233-9 25, https:// doi.org/10.1007/978-3-540-30233-9\_25


https://doi.org/10.1145/1411286.1411292, https://doi.org/10.1145/1411286. 1411292


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **For a Few Dollars More Verified Fine-Grained Algorithm Analysis Down to LLVM**

Maximilian P. L. Haslbeck1(-) ID and Peter Lammich<sup>2</sup> ID

<sup>1</sup> Technische Universit¨at M¨unchen, M¨unchen, Germany haslbema@in.tum.de <sup>2</sup> The University of Manchester, Manchester, England peter.lammich@manchester.ac.uk

**Abstract.** We present a framework to verify both, functional correctness and worst-case complexity of practically efficient algorithms. We implemented a stepwise refinement approach, using the novel concept of resource currencies to naturally structure the resource analysis along the refinement chain, and allow a fine-grained analysis of operation counts. Our framework targets the LLVM intermediate representation. We extend its semantics from earlier work with a cost model. As case study, we verify the correctness and O(n log n) worst-case complexity of an implementation of the introsort algorithm, whose performance is on par with the state-of-the-art implementation found in the GNU C++ Library.

**Keywords:** Algorithm Analysis · Program Verification · Refinement

# **1 Introduction**

In general, not only correctness, but also the complexity of algorithms is important. While it is obvious that the performance observed during experiments is essential to solve practical problems efficiently, also the theoretical worst-case complexity of algorithms is crucial: a good worst-case complexity avoids timing regressions when hitting worst-case input, and, even more important, prevents denial of service attacks that intentionally produce worst-case scenarios to overload critical computing infrastructure.

For example, the C++ standard requires implementations of std::sort to have worst-case complexity O(n log n) [7]. Note that this rules out quicksort [12], which is very fast in practice, but has quadratic worst-case complexity. Nevertheless, some standard libraries, most prominently LLVM's libc++ [20], still use sorting algorithms with quadratic worst-case complexity.<sup>3</sup>

A practically efficient sorting algorithm with O(n log n) worst-case complexity is Musser's introsort [22]. It combines quicksort with the O(n log n) heapsort algorithm, which is used as fallback when the quicksort recursion depth

<sup>3</sup> See, e.g., https://bugs.llvm.org/show\_bug.cgi?id=20837.

c The Author(s) 2021

N. Yoshida (Ed.): ESOP 2021, LNCS 12648, pp. 292–319, 2021.

https://doi.org/10.1007/978-3-030-72019-3\_11

exceeds a certain threshold. It allows to implement standard-compliant, practically efficient sorting algorithms. Introsort is implemented by, e.g., the GNU C++ Library (libstdc++) [8].

In this paper, we present techniques to formally verify both, correctness and worst-case complexity of practically efficient implementations. We build on two previous lines of research by the authors.

On one hand, we have the Isabelle Refinement Framework [19], which allows for a modular top-down verification approach. It utilizes stepwise refinement to separate the different aspects of an efficient implementation, such as algorithmic idea and low-level optimizations. It provides a nondeterminism monad to formalize programs and refinements, and the Sepref tool to automate canonical data refinement steps. Its recent LLVM back end [15] allows to verify algorithms with competitive performance compared to (unverified) highly optimized C/C++ implementations. The Refinement Framework has been used to verify the functional correctness of an implementation of introsort that performs on par with libstdc++'s implementation [17].

On the other hand, we already have extended the Refinement Framework to reason about complexity [11]. However, this only supports the Imperative/HOL back end [16]. It generates implementations in functional languages, which are inherently less efficient than highly optimized C/C++ implementations. This paper combines and extends these two approaches. Our main contributions are.


Our formalization is available at https://www21.in.tum.de/~haslbema/ llvm-time.

# **2 Specification of Algorithms With Resources**

We use the formalism of monads [24] to elegantly specify programs with resource usage. We first describe a framework that works for a very generic notion of resource, and then instantiate it with resource functions, which model resources of different currencies. We then describe a refinement calculus and show how currencies can be used to structure stepwise refinement proofs. Finally, we report on automation and give some examples.

### **2.1 Nondeterministic Computations With Resources**

Let us examine the features we require for our computation model.

First, we want to specify programs by their desired properties, without having to fix a concrete implementation. In general, those programs have more than one correct result for the same input. Consider, e.g., sorting a list of pairs of numbers by the first element. For the input [(1, 2),(2, 2),(1, 3)], both [(1, 2),(1, 3),(2, 2)] and [(1, 3),(1, 2),(2, 2)] are valid results. Formally, this is modelled as a set of possible results. When we later fix an implementation, the set of possible results may shrink. For example, the (stable) insertion sort algorithm always returns the list [(1, 2),(1, 3),(2, 2)]. We say that insertion sort refines our specification of sorting.

Second, we want to define recursion by a standard fixed-point construction over a flat lattice. The bottom of this lattice must be a dedicated element, which we call fail. It represents a computation that may not terminate.

Finally, we want to model the resources required by a computation. For nondeterministic programs, these may vary depending on the nondeterministic choices made during the computation. As we model computations by their possible results, rather than by the exact path in the program that leads to the result, we also associate resource cost with possible results. When more than one computation path leads to the same result, we take the supremum of the used resources. The notion of refinement is now extended to a subset of results that are computed using less resources.

We now formalize the above intuition: The type

(α,γ) NREST = fail | res (α → γ option)

models a nondeterministic computation with results of type α and resources of type γ. <sup>4</sup> That is, a computation is either fail, or res M, where M is a partial function from possible results to resources.

We define spec Φ T as a computation of any result r that satisfies Φ r using T r resources: spec Φ T = res (λr. if Φ r then Some (T r) else None). By abuse of notation, we write spec x T for spec (λr. r=x) (λ . T).

Based on an ordering on the resources γ, we define the refinement ordering on NREST, by first lifting the ordering to option with None as the bottom element, then pointwise to functions and finally to (α,γ) NREST, setting fail as the top element. This matches the intuition of refinement: m ≤ mreads as m refines m , i.e., m has less possible results than m , computed with less resources.

We require the resources γ to have a complete lattice structure, such that we can form suprema over the (possibly infinitely many) paths that lead to the same result. Moreover, when sequentially composing computations, we need to add up the resources. This naturally leads to a monoid structure (γ, 0, +), where 0, intuitively, stands for no resources.

We call such types γ resource types, if they have a complete lattice and monoid structure. Note that, in an earlier iteration of this work [11], the resource type

<sup>4</sup> The name NREST abbreviates **N**ondeterministic **RES**ult with **T**ime, and has been inherited from our earlier formalizations.

was fixed to extended natural numbers (enat=<sup>N</sup> ∪ {∞}), measuring the resource consumption with a single number. Also note that (α,unit) NREST is isomorphic to our original nondeterministic result monad without resources [19].

If γ is a resource type, so is η → γ. Intuitively, such resources consist of coins of different resource currencies η, the amount of coins being measured by γ.

Example 1. In the following we use the resource type ecost = string → enat, i.e., we have currencies described by a string, whose amount is measured by extended natural numbers, where ∞ models arbitrary resource usage. Note that, while the resource type string→enat guides intuition, most of our theory works for general resource types of the form η → γ or even just γ.

We define the function \$<sup>s</sup> n to be the resource function that uses n :: enat coins of the currency s :: string, and write \$<sup>s</sup> as shortcut for \$<sup>s</sup> 1.

A program that sorts a list in O(n<sup>2</sup>) can be specified by:

$$\text{1\ }sort\_{\text{spec}}\ xs = \mathsf{spec}\ (\lambda xs'.\ \text{sorted}\ xs' \land \ mset\ xs' = mset\ xs) \ (\ $\_q\ |xs|^2 + \$ \_c)$$

that is, a list xs can result in any sorted list xs with the same elements, and the computation takes (at most) quadratically many q coins in the list length, and one c coin, independently of the list length. Intuitively, the q and c coins represent the constant factors of an algorithm that implements that specification and are later elaborated by exchanging them into several coins of more finegrained currencies, corresponding to the concrete operations in the algorithm, e.g., comparisons and memory accesses. Abstract currencies like q and c only "have value" if they can be exchanged to meaningful other currencies, and finally pay for the resource costs of a concrete implementation.

### **2.2 Atomic Operations and Control Flow**

In order to conveniently model actual computations, we define some combinators. The elapse m t combinator adds the (constant) resources t to all results of m:

```
elapse :: (α,γ) NREST → γ → (α,γ) NREST
elapse fail t = fail
elapse (res M) t = res (λx. case M x of None ⇒ None
                                   | Some t
                                            ⇒ Some (t + t

                                                          ))
```
The program<sup>5</sup> return x computes the single result x without using any resources:

return :: α → (α,γ) NREST return x = res [ x → 0 ]

The combinator bind m f models the sequential composition of computations m and f, where f may depend on the result of m:

<sup>5</sup> Note that our shallow embedding makes no formal distinction between syntax and semantics. Nevertheless, we refer to an entity of type NREST, as program to emphasize the syntactic aspect, and as computation to emphasize the semantic aspect.

bind :: (α,γ) NREST → (α → (β,γ) NREST) → (β,γ) NREST bind fail f = fail bind (res M) f = Sup { elapse (f x) t |x t. M x = Some t }

If the first computation m fails, then also the sequential composition fails. Otherwise, we consider all possible results x with resources t of m, invoke f x, and add the cost t for computing x to the results of f x. The supremum aggregates the cases where f yields the same result, via different intermediate results of m, and also makes the whole expression fail if one of the f x fails.

Example 2. We now illustrate an effect that stems from our decision to aggregate the resource usage of different computation paths that lead to the same result. Consider the program

res (λn::nat. Some (\$<sup>c</sup> n)); return 0

It first chooses an arbitrary natural number n consuming n coins of currency c, and then returns the result 0. That is, there are arbitrarily many paths that lead to the result 0, consuming arbitrarily many c coins. The supremum of this is ∞, such that the above program is equal to elapse (return 0) (\$<sup>c</sup> ∞). Note that none of the computation paths actually attains the aggregated resource usage. We will come back to this in Section 4.4.

Finally, we use Isabelle/HOL's if-then-else and define a recursion combinator rec via a fixed-point construction [13], to get a complete set of basic combinators. As these combinators also incur cost in the target LLVM, we define resource aware variants. Furthermore we also derive a while combinator:

```
ifc b then c1 else c2 = elapse (r ← b; if r then c1 else c2) $if
recc Fx= elapse (rec (λD x. F (λx. elapse (D x) $call) x) x) $call
whilec bfs= recc (λD s. ifc b s then s ← f s; D s else return s) s
```
Here, the guard of if<sup>c</sup> is a computation itself, and we consume an additional if coin to account for the conditional branching in the target model. Similarly, every recursive call consumes an additional call coin.

Assertions fail if their condition is not met, and return unit otherwise:

assert P = if P then return () else fail

They are used to express preconditions of a program. A Hoare-triple for program m, with precondition P, postcondition Q and resource usage t is written as a refinement condition: m ≤ assert P; spec Q (λ . t)

Example 3. Comparison of two list elements at a cost of t can be specified by:

idxs cmpspec xs i j (t) = assert (i<|xs| ∧ j<|xs|); spec (xs!i < xs!j) (λ . t)

where xs!i is the ith element of list xs. Instead of fixing the cost for specifications, we pass them as parameter t. This allows us to refine different instances of abstract data types (here lists) by different concrete data structures with different costs. To make bigger programs more readable, we note the cost parameter in parenthesis at the end of the line, as, e.g., in Example 4.

### **2.3 Refinement on NREST**

We have used the refinement ordering to express Hoare triples. Two other applications of refinement are data refinement and currency refinement.

Data Refinement A typical use-case of refinement is to implement an abstract data type by a concrete data type. For example, we could implement (finite) sets of numbers by sorted lists. We define a refinement relation R between sorted lists and sets. A concrete computation m† that yields sorted lists then refines an abstract computation m that yields sets, if every possible concrete result is related to a possible abstract result. Formally, m† ≤ ⇓DR m, where the operator ⇓<sup>D</sup> is defined, for arguments R and m, by the following two rules.

$$\Downarrow\_{D} \mathsf{R}\left(\mathsf{res}\ M\right) = \mathsf{res}\ \left(\lambda c. \,\,\mathrm{Sup}\,\left\{M\ a\mid a.\,\left(c,a\right)\in R\right\}\right) \qquad \Downarrow\_{D} R\ \textbf{fail1} = \mathtt{fail1}$$

Again, we use the supremum to aggregate the costs of all abstract results that are related to a concrete result. As in Example 2, this leads to the possibility that the supremum cost is not attained, which we discuss in Section 4.4.

Currency Refinement Consider we want to refine Example 3 into a program that first accesses the elements and then compares them.

Example 4. We refine idxs cmpspec (\$idxs cmp) from Example 3 as follows:

$$\begin{array}{ll} \mathit{idxs\\_cmp\ xs\ i\ j=} \\ \mathtt{assert}\ (i\mathrel{\mathit{<}} |\mathit{xs}|\wedge j\mathrel{\mathit{<}} |\mathit{xs}|); \\ xsi \leftarrow list.get\_{spec}\ xs\ i; \\ xsj \leftarrow list.get\_{spec}\ xs\ j; \\ \mathtt{return}\ (xsi < xsj) \end{array} \qquad \begin{array}{ll} (\mathtt{\mathit{\stackrel{\mathit{<}}}{\mathit{<}} |\mathit{s}\_{lookup})) \\ (\mathtt{\stackrel{\mathit{<}}}{\mathit{<}} |\mathit{xs}|) \\ (\mathtt{\stackrel{\mathit{<}}}{\mathit{\stackrel{\mathit{<}}}{\mathit{<}}}) \end{array}$$

where list getspec xs i (T) = assert (i < |xs|); spec (xs!i) T and return x (T) returns the result x incurring cost T.

Note that idxs cmp and idxs cmpspec use different, incompatible currency systems. To compare them, we need to exchange coins: one idxs cmp coin will be traded for two lookup coins and one less coin.

To make that happen we introduce the currency refinement ⇓<sup>C</sup> E m. Here, the exchange rate E :: η<sup>a</sup> → η<sup>c</sup> → γ specifies for each abstract currency c<sup>a</sup> :: η<sup>a</sup> how many of the coins of the concrete currency c<sup>c</sup> :: η<sup>c</sup> are needed. Note that, in general, one abstract coin may be exchanged into multiple coins of different currencies. For a resource type γ that provides a multiplication operation (∗) we define the operator ⇓<sup>C</sup> with the following two rules.

$$\begin{array}{c} \Downarrow\_C E \text{ (res } M) = \texttt{res} \text{ (}\lambda\text{ } r\text{. } \texttt{case } M\text{ } r\text{ of }\ \texttt{None} \Rightarrow\text{None}\text{)}\\ \quad \begin{array}{c} \text{Some } t \Rightarrow \text{ Some } (\lambda c\_c.\sum\_{c\_a} \ t \ c\_a \ast E \ c\_a \text{ c\_c})) \end{array} \\ \Downarrow\_C E \text{ \'ai1 = \'fail1} \end{array}$$

The refined computation has the same results as the original. To get the amount of a concrete coin c<sup>c</sup> for some result r with resource function t, we sum, over all abstract coins ca, the amount of abstract coins needed in the original computation (t ca) weighted by the exchange rate (E c<sup>a</sup> cc).

For the sum to make sense, there must be only finitely many abstract coins c<sup>a</sup> with t c<sup>a</sup> ∗ E c<sup>a</sup> c<sup>c</sup> = 0. This can be ensured by restricting the resource functions t of the computation to use finitely many different coins, or by restricting the exchange rate E accordingly. The latter can be checked syntactically in practice.

Example 5. For refining the specification idxs cmpspec we can use the exchange rate E<sup>1</sup> = 0(idxs cmp:= \$lookup 2 + \$less), which does the correct exchange for idxs cmp and is zero everywhere else. Here, + and 0 are lifted to functions in a pointwise manner, and f(·:=·) denotes a function update. We can now prove:

idxs cmp xs i j ≤ ⇓<sup>C</sup> E<sup>1</sup> (idxs cmpspec xs i j (\$idxs cmp))

### **2.4 Refinement Patterns**

In practice, we encounter certain recurring patterns of refinement, which we describe in this section.

Refinement of Specifications Instead of only asking whether a program m satisfies a specification res M, we also ask how much it satisfies the specification, i.e. what is the difference of the resources specified and actually used, denoted by gwp m M. <sup>6</sup> We have the following equality: <sup>m</sup> <sup>≤</sup> res <sup>M</sup> <sup>⇔</sup> Some 0 <sup>≤</sup> gwp m M.

To get some intuition let us fix the resource to be time. Then, gwp m M is the latest feasible time at which we can start m to still match the deadline M. If there is no feasible starting time (gwp m M = None), m does not fulfill the specification M. If it has some value t, this is the latest feasible starting time of all computation paths in m.

Using gwp, we can implement a syntax driven verification condition generator, as already described in [11].

Lockstep Refinement We often refine a compound program by refining some of its components. Let A and C be two structurally equal programs (i.e., they have the same structure of combinators ifc, recc, bind, etc.), and let A<sup>i</sup> and C<sup>i</sup> be the pairs of corresponding basic components, for i∈{0,... ,n}. Provided with refinement lemmas Φ<sup>i</sup> x ∧ (x†,x) ∈ R <sup>i</sup> =⇒ C<sup>i</sup> x† ≤ ⇓DR<sup>i</sup> (⇓<sup>C</sup> E (A<sup>i</sup> x)) for each of those pairs,<sup>7</sup> an automatic procedure walks through the program and establishes a refinement C ≤ ⇓DR<sup>n</sup> (⇓<sup>C</sup> E A). This process generates verification conditions for ensuring the preconditions Φi, which can be discharged automatically or, if required, via interactive proof.

<sup>6</sup> The definition of gwp requires γ to provide a difference operator, dual to its + operator. It is a straightforward generalization of the concept defined in [11], and thus omitted here. We only note that the resource types unit, enat, and ecost provide a suitable difference operator.

<sup>7</sup> The refinement relations R- <sup>i</sup> and R<sup>i</sup> relate the parameters and respectively the result of those components.

Note that, while the data refinements R<sup>i</sup> can be different for each component i, the exchange rate E must be the same for all components. Currently, we align the exchange rates by manually deriving specialized versions of the component refinement lemmas. However, we believe that this can be automated in many practical cases, by collecting constraints on the exchange rate during the lockstep refinement, which are solved afterwards to obtain a unified exchange rate. We leave the implementation of this idea to future work.

Separating Analysis of Resource Usage and Correctness We can disregard resource usage and only focus on refinement of functional correctness, and then add resource usage analysis later. This is useful to separate the concerns of functional correctness and resource usage proof. We will describe a practical example later (Section 5.5), and only present an alternative way to prove the refinement in Example 4 here:

First, for functional correctness, we use the specification idxs cmpspec (∞) and a program idxs cmp<sup>∞</sup> similar to idxs cmp but with all the costs replaced by ∞. Proving the refinement idxs cmp<sup>∞</sup> xs i j ≤ idxs cmpspec xs i j (∞) only requires showing verification conditions that correspond to functional properties and termination. In particular, assertions and annotated invariants in the concrete program have to be proved. Proof obligations on resource usage, however, collapse into the trivial t ≤ ∞. For the same reason, we get idxs cmp xs i j ≤ idxs cmp<sup>∞</sup> xs i j, and by transitivity obtain

idxs cmp xs i j ≤ idxs cmpspec xs i j (∞)

Next, we prove idxs cmp xs i j ≤<sup>n</sup> spec (λ .True) (\$lookup 2 + \$less). Here, the refinement relation m ≤<sup>n</sup> m = m = fail =⇒ m ≤ m assumes that the concrete program does not fail. This has the effect that, during the refinement proof, assertions and annotated invariants in the concrete program can be assumed to hold, and we can focus on the resource usage proof.

Finally, the two refinements can be combined to obtain

idxs cmp xs i j ≤ idxs cmpspec xs i j (\$lookup 2 + \$less)

# **3 LLVM With Cost Semantics**

The NREST-monad allows to specify programs with their resource usage in abstract currencies. Those currencies only have a meaning when they finally can be exchanged for the costs of concrete computations. In the following we present such a concrete computation model, namely a shallow embedding of the LLVM semantics into Isabelle/HOL. The embedding is an extension of our earlier work [15] to also account for costs. In Section 4 we then report on linking the LLVM back end with the NREST front end.

### **3.1 Basic Monad**

At the basis of our LLVM formalization is a monad that provides the notions of non-termination, failure, state, and execution costs.

α mres = NTERM | FAIL | SUCC α cost state α M = state → α mres

Here, cost is a type for execution costs, which forms a monoid with operation + and neutral element 0, and state is an arbitrary type.<sup>8</sup>

The type α M describes a program that, when executed on a state, either does not terminate (NTERM), fails (FAIL), or returns a result of type α, its execution costs, and a new state (SUCC).

It is straightforward to define the monad operations return and bind, as well as a recursion combinator rec over M. Thanks to the shallow embedding, we can also use Isabelle HOL's if-then-else to get a complete set of basic operations. As an example, we show the definition of the bind operation, in the case that both arguments successfully compute a result:

Assume m s = SUCC x c<sup>1</sup> s<sup>1</sup> and fxs<sup>1</sup> = SUCC r c<sup>2</sup> s<sup>2</sup> then we have bind m f s = SUCC r (c1+c2) s<sup>2</sup>

That is, the result x and state s<sup>1</sup> after the first operation m is passed into the second operation f, and the result and state after the bind is what emerges from f. The cost for the bind is the sum of the costs for both operations.

The basic monad operations do not cost anything. To account for execution costs, we define an explicit operation consume c s = SUCC () c s. 9

### **3.2 Shallowly Embedded LLVM Semantics**

The formalization of the LLVM semantics is organized in layers. At the bottom, there is a memory model that stores deeply embedded values, and comes with basic operations for allocation/deallocation, loading, storing, and pointer manipulation. Also the basic arithmetic operations are defined on deeply embedded integers. These operations are phrased in the basic monad, but consume no costs. This way, we could take them unchanged from our original LLVM formalization without cost [15]. For example, the low-level load operation has the signature raw load :: raw ptr → val M. Here, raw ptr is the pointer type of our memory model, consisting of a block address and an offset, and val is our value type, which can be an integer, a pointer, or a pair of values.

On top of the basic layer, we define operations that correspond to the actual LLVM instructions. Here, we map from deeply embedded values to shallowly embedded values, and add the execution costs.

For example, the semantics of LLVM's load instruction is defined as follows:

<sup>8</sup> Note that this differs from the NREST monad in Section 2.1: it is deterministic, and provides a state. Because of determinism, we never need to form a supremum, and thus can base our cost model on natural numbers rather than enats. We leave a unification of the two monads to future work.

<sup>9</sup> For NREST, we defined a higher-order operation elapse, while we use the firstorder operation consume here. This is for historical reasons. Note that elapse can be defined in terms of consume, and vice versa.

ll load :: α ptr → α M ll load p = consume \$load; r ← raw load (the raw ptr p); checked from val r

It consumes the cost<sup>10</sup> for the operation, and then forwards to the raw load operation of the lower layer, where the raw ptr and checked from val convert between the shallow and deep embedding of values.

Like in the original formalization<sup>11</sup>, an LLVM program is represented by a set of monomorphic constant definitions of the shape def, defined as follows:

def = proc name var<sup>∗</sup> ≡ block block = var ← cmd; block | return var cmd = ll <opcode> arg<sup>∗</sup> | ll call proc name arg<sup>∗</sup> | llc if arg block block | llc while block block arg = var | number | null | init

The code generator checks that the set of definitions is complete and adheres to the required shape. It then translates them into LLVM code, which merely amounts to pretty printing and translating the structured control flow by if and while<sup>12</sup> statements to the unstructured control flow of LLVM. A powerful preprocessor can convert a more general class of terms to the restricted shape required by the code generator. This conversion is done inside the logic, i.e., the processed program is proved to be equal to the original. Preprocessing steps include monomorphization of polymorphic constants, extraction of fixed-point combinators to recursive function definitions, and conversion of tuple constructors and destructors to LLVM's insertvalue and extractvalue instructions.

In summary, the layered architecture of our LLVM formalization allowed for a smooth integration of the cost aspect, reusing most of the existing formalization nearly unchanged. Note that we opted to integrate the cost aspect into the existing top layer, which converts between deep and shallow embedding. Alternatively, we could have added another layer on top of the shallow embedding. While the latter would have been the cleaner design, we opted for the former approach to avoid the boilerplate of adding a new layer. This was feasible as the original top layer was quite thin, such that adding another aspect there did not result in excessive complexity.

<sup>10</sup> See Section 3.3 for an explanation of our cost model.

<sup>11</sup> Actually, the only change to the original formalization is the introduction of the ll call instruction, to make the costs of a function call visible.

<sup>12</sup> Primitive while loops are not strictly required, as they can always be replaced by tail recursion. Indeed, our code generator can be configured to not accept while loops, and our preprocessor can automatically convert while loops to tail-recursive functions. However, the efficiency of the generated code then relies on LLVM's optimization pass to detect the tail recursion and transform it to a loop again.

### **3.3 Cost Model**

As a cost model for running time, we chose to count how often each instruction is executed. That is, we set cost = string → nat, where the string encodes the name of an instruction. It is straightforward to define 0 and + such that (cost,0,+) forms a monoid. It is thus a valid cost model for our monad.

But how realistic is our cost model, counting LLVM instructions? During compilation, LLVM text will be transformed by LLVM's optimizer, and finally, the LLVM's back end will translate LLVM instructions to machine instructions. Moreover, the actual running time of a machine program does not only depend on the number of executed instructions, but effects like pipeline flushes and cache misses also play an important role. Thus, without factoring in the details of the optimization passes and the target machine architecture, our cost model can, at best, be a rough approximation of the actual running time.

However, we can sensibly assume that a single instruction in the original LLVM text will result in at most a (small) constant number of machine instructions, and that each machine instruction has a constant worst-case execution time. Thus, the steps counted by our model linearly correlate to an upper bound of the actual execution time, though the exact correlation depends on the actual program, optimizer passes, and target architecture. Hence, while our cost model cannot be used for precise statements about execution time, it can be used to prove worst-case complexity. That is, a program that we have proved efficient will be compiled to an efficient machine program. Moreover, we can hope that the constant factors in the proved complexity are related to the actual constant factors in the machine program, i.e., an LLVM program with small constant factors will compile to a machine program with small constant factors.

The above discussion justifies the following design choices: The insertvalue and extractvalue instructions, which are used to construct and destruct tuple values, have no associated costs. The main reason for this design is to enable transparent use of tupled values, e.g., to encode the state of a while loop. We expect LLVM to translate the members of the tuple to separate registers anyway, such that no real costs are associated with tupling/untupling.

We define the malloc instruction to take cost proportional to the number of allocated elements. Note that LLVM itself does not provide memory management, and our code generator forwards memory management instructions to the libc implementation of the target platform. We use the calloc function here, which is supposed to initialize the allocated memory with zeros. While the exact costs of that are implementation dependent, they certainly will depend on the size of the allocated block.

Chargu´eraud and Pottier [6, §2.7] discuss the adequacy of abstract cost models in a functional setting. In their classification, our abstraction is on Level 2.

### **3.4 Reasoning Setup**

Once we have defined the semantics, we need to set up some basic reasoning infrastructure. The original Isabelle-LLVM already comes with a quite generic separation logic and verification condition generation framework. Here, we report on our extensions to resources using time credits.

Separation Logic with Time Credits Our reasoning infrastructure is based on separation logic with time credits [1,6,10]. We follow the algebraic approach of Calcagno et al. [3], using an earlier extension [15] of Klein et al. [18].

A separation algebra on type α induces a separation logic on assertions that are predicates over α. To guide intuition, elements of α are called heaps here. We use the following separation logic operators: The assertion ↑Φ holds for an empty heap if <sup>Φ</sup> holds, =↑True describes the empty heap, and <sup>∃</sup><sup>A</sup> is the existential quantifier lifted to assertions. The separating conjunction P Q describes a heap comprised from two disjoint parts, one described by P and the other described by Q, and entailment P Q states that Q holds for every heap described by P.

Separation algebras naturally extend over product and function types, i.e., for separation algebras α, β, and any type γ, also α × β and γ → α are separation algebras, where the operations are lifted pointwise.

Note that enat forms a separation algebra, where elements, i.e. time credits, are always disjoint. Hence, also ecost = string → enat, and amemory × ecost are separation algebras, where amemory is the separation algebra that we already used in [15] to describe the abstract memory of LLVM. Thus, amemory × ecost induces a separation logic with time credits that match our cost model. The time credit assertion \$ c = (λa. a=(0,c)) describes an empty memory (0) and precisely the time c. The primitive assertions on amemory are lifted analogously to describe no time credits.

Weakest Precondition and Hoare Triples We start by defining a concrete state cstate that describes the memory content and the available resources:

cstate = memory × ecost

where memory is the memory type from our original LLVM formalization. Based on this, we define the weakest precondition predicate:

 $wp\ ::\alpha\ M \to (\alpha \to cstate \to bool) \to cstate \to bool$   $wp\ m\ Q\ (s, cc) = (\exists r\ c\ s\ t.\ m\ s = SSCC\ r\ c\ s' \land\ c\leq cc \land Q\ r\ (s', cc - c)).$ 

Intuitively, the costs cc stored in the state is the credit available to the program. The weakest precondition holds if the program runs with real costs c that are within the available credit, and Q holds for the result r, the new memory s , and the new credit, cc−c, which is the old credit reduced by the actually required costs. Note that actual costs have type cost = string → nat, i.e., are always finite, while the credits have type ecost = string → enat, i.e., there can be infinite credits. Setting the credit to be infinite for all instruction types yields the classical weakest precondition that requires termination, but enforces no time limit.

Our concrete state type, in particular the memory, does not form a separation algebra, as the natural memory model of LLVM has no natural notion of partial memories. Thus, we define an abstraction function that maps a concrete state to an abstract state astate, which forms a separation algebra:

$$laste = memory \times cost \qquad \qquad abs \ (m, c) = (abs\_m m, c)$$

$$\text{abs}\ (m,\ c) \ = \begin{pmatrix} abs\_m \ m,\ c \end{pmatrix}$$

Again, amemory and abs<sup>m</sup> is the abstract state and abstraction function from the original LLVM formalization. The costs already form a separation algebra, so we do not abstract them further.

With this, we can instantiate a generic VCG infrastructure: let cstate be concrete states, wp :: α M → (α → cstate → bool) → cstate → bool be a weakest precondition predicate, and astate an abstract state, linked to concrete states via an abstraction function abs :: cstate → astate. Further, assume that wp distributes over conjunctions, i.e.,

$$wp \ c \ Q\_1 \ s \land wp \ c \ Q\_2 \ s \implies wp \ c \ (\lambda r \ s'. \ Q\_1 \ r \ s' \land \ Q\_2 \ r \ s') \ s$$

Finally, let be an affine top [5], i.e., an assertion with and <sup>=</sup> , which captures resources that can be safely discarded. We define the Hoare triple {P} c {Q} to hold iff:

$$(\forall F \ s. \ (P \star F) \ (abs \ s) \implies wp \ c \ (\lambda r \ s'. \ (Q \ r \star \mathsf{T} \ \star F) \ (abs \ s)) \ s$$

Intuitively, {P} c {Q} holds if, for all states that contain a part described by assertion P, command c terminates with result r and a state where that part is replaced by a part described by Q r , and the rest of the state has not changed. Here, Q r is the postcondition of the Hoare triple, and describes resources that may be left over and can be discarded.

In our case, we set to describe the empty memory and any amount of time credits. This matches the intuition that a program must free all its memory, but may run faster than estimated, i.e., leave over some time credits. Note that our wp distributes over conjunctions.

The generic VCG infrastructure now provides us with a syntax driven VCG with a simple frame inference heuristics.

### **3.5 Primitive Setup**

Once we have defined the basic reasoning infrastructure, we have to prove Hoare triples for the basic LLVM instructions and control flow combinators. As we have added the cost aspect only at the top level of our semantics, we can reuse most of the material from our original LLVM formalization without time. Technically, we instantiate our reasoning infrastructure with a weakest precondition predicate wpn, which only holds for programs that consume no costs. We define:

$$wpn\ m\ Q\ s =wp\ m\ (FST\circ\ Q)\ (s,\emptyset)\ \text{where}\ FST\ P = \lambda(s,c).\ P\ s\ \wedge\ c = \emptyset$$

The resulting reasoning infrastructure is identical with the one of our original formalization, most of which could be reused. Only for the topmost level, i.e., for those functions that correspond to the functional semantics of the actual LLVM instructions, we lift the Hoare triples over wpn to Hoare triples over wp:

$$\{P\}\ c\left\{Q\right\}\_{wpm} = \{FST\ P\}\ c\left\{FST\circ Q\right\}$$

Example 6. Recall the low-level raw load and the high-level ll load instruction from Section 3.2. The raw load instruction consumes no costs, and our original LLVM formalization provides the following Hoare triple:

{raw pto p x} raw load p {λr. ↑(r=x) raw pto p x}wpn

This can be transferred to a Hoare triple over wp:

{FST (raw pto p x)} raw load p {λr. ↑(r=x) FST (raw pto p x)}

which is then used to prove the Hoare triple for the program ll load

{ \$ \$load pto p x} ll load p {λr. ↑(r=x) pto p x}

where pto p x = FST (raw pto (the raw ptr p) (to val x)).

Using the VCG and the Hoare triples for the LLVM instructions, we can now define and prove correct data structures and algorithms. While this works smoothly for simple data structures like arrays, it does not scale to more complex developments. In contrast, NREST does scale, but lacks support for the low-level pointer reasoning required for basic data structures. In the next section, we show how to combine both approaches, with the LLVM level providing basic data structures and the NREST level using them as building blocks for larger algorithms.

# **4 Automatic Refinement**

In this section we describe a tool to synthesize a concrete program in the LLVMmonad from an abstract algorithm in the NREST-monad. It can automatically refine abstract functional data structures to imperative heap-based ones. We will describe the synthesis predicate hnr that connects the two monads, the synthesis tool, and a way to extract Hoare triples from hnr predicates. Finally, we will discuss an effect that prevents combining hnr with data refinements in the NREST-monad in the general case.

### **4.1 Heap nondeterminism refinement**

The heap nondeterminism refinement predicate hnr Γ m† Γ R m intuitively expresses that the concrete program m† computes a concrete result that relates, via the refinement assertion R, to a result in the abstract program m, using at most the resources specified by m for that result. A refinement assertion describes how an abstract variable is refined by a concrete value on the heap. It can also contain time credits. The assertions Γ and Γ constitute the heaps before and after the computation and typically are a separating conjunction of refinement assertions for the respective parameters of m† and m. Formally, we define:

$$\begin{array}{l} \mathit{hnr}\ I\ m\_{\uparrow}\ I'\ R\ m = m \neq \mathbf{f}\ \mathbf{ai}\ 1 \implies\\ \begin{array}{l} \left(\forall F\ s\ c.\left(\varGamma\star F\right)\left(\operatorname{abs}\_{m}\ s,\mathrm{c}\right)\implies\\ \left(\exists r\_{a}\ c\_{a}.\ \mathsf{e}\ \mathsf{1apse}\ \left(\mathsf{return}\ r\_{a}\right)\ c\_{a}\leq m\right)\\ \land\ w\ p\ m\_{\uparrow}\ \left(\lambda r\left(s',\mathrm{c}\right).\ \left(\varGamma'\star R\ r\ r\_{a}\star F\star\top\right)\left(\operatorname{abs}\_{m}\ s',\mathrm{c}\right)\right)\left(\mathsf{s},\ \mathrm{c}+\mathrm{c}\_{a}\right)) \end{array} \end{array}$$

The predicate holds if either the abstract program fails or if, for all heaps and resources (s,c) that satisfy the pre-assertion Γ with some frame F, there exists an abstract result and cost (ra,ca) that refine m, and m† terminates with concrete result r in a state s where Γ with the frame holds, and r relates to the abstract result via assertion R. The execution costs of m† and the time credits crequired by the post-assertion Γ are paid for by the specified cost c<sup>a</sup> and the time credits c described by the pre-assertion Γ. Thus, the real costs are paid by a combination of the advertised costs in the abstract program and the potential difference of Γ and Γ, allowing to seamlessly model amortized computation costs.

Using the affine top , it is possible for the program to throw away portions of the heap. Note that our can only discard time credits. Memory must be explicitly freed by the concrete program m†.

Also note that hnr is not tied to the LLVM semantics specifically. It actually is a general pattern for combining the NREST-monad with any other program semantics that provides a weakest precondition and a separation algebra for data and resources.

# **4.2 The Sepref Tool**

The Sepref tool [14,15] automatically synthesizes a concrete program in the LLVM-monad from an abstract algorithm in the NREST-monad. It symbolically executes the abstract program while maintaining refinements for the abstract variables to a concrete representation and generates a concrete program as well as a valid hnr predicate. Proof obligations<sup>13</sup> that occur during this process are discharged automatically, guided by user-provided hints where necessary.

The synthesis requires rules for all abstract combinators. For example, bind is processed by the following rule:

<sup>1</sup> hnr Γ m† Γ R<sup>x</sup> m;

2 (∀x x†. hnr (R<sup>x</sup> x† x Γ ) (f† x†) (R <sup>x</sup> x† x Γ) R<sup>y</sup> (f x));

3 MK FREE R <sup>x</sup> free <sup>=</sup><sup>⇒</sup>

```
4 hnr Γ (x† ← m†; r† ← f† x†; free x†; return r†) Γ Ry (x ← m; f x)
```
To refine x ← m; f x, we first execute m, synthesizing the concrete program m† (line 1). The state after m is R<sup>x</sup> x† x Γ , where x is the result created by m. From this state, we execute f x and synthesize f† x† (line 2). The new state is R <sup>x</sup> x† x Γ R<sup>y</sup> y† y, where y is the result of f x. Now, the intermediate variable x goes out of scope and has to be deallocated. The predicate MK FREE R <sup>x</sup> free (line 3) states that free is a deallocator for data structures implemented by refinement assertion R <sup>x</sup>. Note that free can only use time credits that are stored in R <sup>x</sup>. Typically, these are payed for during creation of the data structure. This way amortization can be used effectively to hide the necessary free operation and its costs in the abstract program.

All other combinators (recc, ifc, whilec, etc.) have similar rules that are used to decompose an abstract program into parts, synthesize corresponding con-

<sup>13</sup> E.g. from implementing mathematical integers with fixed-bit machine words.

crete parts recursively and combine them afterwards with the respective combinators from LLVM. At the leaves of this decomposition, atomic operations need to be provided with suitable synthesis predicates.

An example is a list lookup that is implemented by an array:

$$\begin{array}{ll} \begin{array}{l} \begin{array}{l} \begin{array}{l} \begin{array}{l} \begin{array}{l} \end{array} \end{array} \end{array} \begin{array}{l} \begin{array}{l} xs \ \star snat\_{A} \ \begin{array}{l} i\_{\uparrow} \ \end{array} \end{array} \end{array} \end{array} \end{array} \begin{array}{l} \begin{array}{l} i\_{\uparrow} \ \end{array} \right) \\ \begin{array}{l} \begin{array}{l} \begin{array}{l} \begin{array}{l} \end{array} \end{array} \end{array} \end{array} \begin{array}{l} i\_{\uparrow} \ \end{array} \end{array} \begin{array}{l} i\_{\uparrow} \ \end{array} \end{array} \begin{array}{l} i\_{\uparrow} \ \end{array} \end{array} \begin{array}{l} i\_{\uparrow} \ \end{array} \begin{array}{l} i\_{\uparrow} \ \end{array} \right.$$

where arrayA, snat<sup>A</sup> and id<sup>A</sup> relate a list with an array, an unbounded natural number with a bounded signed word and identical elements respectively. With an array at address p holding the list xs and an index i† that is a bounded signed word representing an unbounded natural number i, array nth leaves the parameters unchanged and extracts the element specified by list getspec incurring costs array getcost=\$of s ptr + \$load.

Ideally, each operation has its own currency (e.g. list get). However, as our definition of hnr does not support currency refinement, the basic operations must use the currencies of the LLVM cost model. To still obtain modular hnr rules, we encapsulate specifications for data structures with their cost, e.g. by defining array getspec=list getspec (λ . array getcost). These can easily be introduced in an additional refinement step. Automating this process, and possibly integrating currency refinement into hnr is left to future work.

### **4.3 Extracting Hoare Triples**

Note that hnr predicates cannot always be expressed as Hoare triples, as the running time bound of the abstract program may depend on the result, which we cannot refer to in the precondition of a Hoare triple, where we have to express the allowed running time as time credits. However, if the running time bound does not depend on the result, we can write hnr as a Hoare triple:

$$\begin{array}{c} \text{hnr } \Gamma \ m\_{\uparrow} \text{ } \Gamma' \ R \text{ (spec } \Phi \text{ (}\lambda\text{.} . \text{)) } = \{\\$T\star\Gamma\} m\_{\uparrow} \{\lambda r. \text{ } \Gamma' \star \exists\_{A} r\_{a}. \text{ } \text{ } R \text{ } r\_{a} \star \uparrow (\Phi \text{ } r\_{a})\} \end{array}$$

While intermediate components might not be of this form, final algorithms typically are. At the end of a development, this rule allows to extract a Hoare triple in the underlying LLVM semantics, cutting out the NREST-monad. For validating the correctness claim of an algorithm, only the final Hoare triple needs to be inspected, which only uses concepts of the underlying semantics.

Note that the above rule is an equivalence. Thus, it can also be used to obtain synthesis rules from Hoare triples provided by the basic VCG infrastructure.

### **4.4 Attain Supremum**

We comment on a problem that arises when composing hnr predicates and data refinement in the NREST monad. Consider the following programs and relations:

$$\begin{array}{ll} m' = \mathsf{res}\,\,[x \mapsto \ $\_{a}, \, y \mapsto \$ \_{b}] & \mathsf{R}\_{R} = \{ (\mathsf{z}, \mathsf{a}), (\mathsf{z}, \mathsf{b}) \} \\ m = \mathsf{res}\,\,[z \mapsto \ $\_{a} + \$ \_{b}] & \mathsf{R}\_{A} = \mathsf{id}\_{A} \\ m\_{\uparrow} = \mathsf{consum}\,\,(\ $\_{a} + \$ \_{b}); \, \mathsf{return}\,\, z \end{array}$$

Data refinement defines the resource bound for a concrete result (here z) as the supremum over all bounds of related results (here x, y). Thus, we have m ≤ ⇓<sup>C</sup> R<sup>R</sup> m . Moreover, we trivially have hnr <sup>m</sup>† <sup>R</sup><sup>A</sup> <sup>m</sup>. Intuitively, we want to compose these two refinements, to obtain hnr <sup>m</sup>† (R<sup>A</sup> ◦ <sup>R</sup>R) <sup>m</sup> . However, as our definition of hnr does not form a supremum, this would require \$<sup>a</sup> + \$<sup>b</sup> ≤ \$<sup>a</sup> or \$<sup>a</sup> + \$<sup>b</sup> ≤ \$b, which obviously does not hold.

We have not yet found a way to define hnr or ⇓<sup>D</sup> in a form that does not exhibit this effect. Instead, we explicitly require that the supremum of the data refinement has a witness. The predicate attains sup m m R<sup>R</sup> characterizes that situation: it holds, if for all results r of m the supremum of the set of all abstractions (r,r )∈R<sup>R</sup> applied to m is in that set. This trivially holds if R<sup>R</sup> is single-valued, i.e. any concrete value is related with at most one abstract value, or if m is one-time, i.e. assigns the same resource bound to all its results.

In practice we do encounter non-single-valued relations<sup>14</sup>, but they only occur as intermediate results where the composition with an hnr predicate is not necessary. Also, collapsing synthesis predicates and refinements in the NRESTmonad typically is performed for the final algorithm whose running time does not depend on the result, thus is one-time, and ultimately attains sup.

# **5 Case Study: Introsort**

In this section, we apply our framework to the introsort algorithm [22]. We build upon the verification of its functional correctness [17] to verify its running time analysis and synthesize competitive efficient LLVM code for it. Following the "top-down" mantra, we use several intermediate steps to refine a specification down to an implementation.

### **5.1 Specification of Sorting**

We start with the specification of sorting a slice of a list:

```
slice sortspec xs0 l h (T) =
 assert (l≤h ∧ h≤length xs0);
 spec (λxs. slice sort aux xs0 l h xs) (λ . T)
```
where slice sort aux xs<sup>0</sup> l h xs states that xs is a permutation of xs0, xs is sorted between l and h and equal to xs<sup>0</sup> anywhere else.

### **5.2 Introsort's Idea**

The introsort algorithm is based on quicksort. Like quicksort, it finds a pivot element, partitions the list around the pivot, and recursively sorts the two partitions. Unlike quicksort, however, it keeps track of the recursion depth, and if it

<sup>14</sup> The relation oarr, described in earlier work [17, 4.2] by one of the authors, is used to model ownership of parts of a list on an abstract level and is an example for a relation that is not single-valued.

exceeds a certain value (typically '2 log n(), it falls back to heapsort to sort the current partition. Intuitively, quicksort's worst-case behaviour can only occur when unbalanced partitioning causes a high recursion depth, and the introsort algorithm limits the recursion depth, falling back to the O(n log n) heapsort algorithm. This combines the good practical performance of quicksort with the good worst-case complexity of heapsort.

Our implementation of introsort follows the implementation of libstdc++, which includes a second optimization: a first phase executes quicksort (with fallback to heapsort), but stops the recursion when the partition size falls below a certain threshold τ . Then, a second phase sorts the whole list with one final pass of insertion sort. This exploits the fact that insertion sort is actually faster than quicksort for almost-sorted lists, i.e., lists where any element is less than τ positions away from its final position in the sorted list. While the optimal threshold τ needs to be determined empirically, it does not influence the worstcase complexity of the final insertion sort, which is O(τn) = O(n) for constant τ . The threshold τ will be an implicit parameter from now on.

While this seems like a quite concrete optimization, the two phases are already visible in the abstract algorithm, which is defined as follows in NREST:

introsort xs l h = assert(l ≤ h); n ← return h−l; (\$sub) if<sup>c</sup> n > 1 then (\$lt) xs ← almost sortspec xs l h; (\$almost sort) xs ← final sortspec xs l h (\$f inal sort) return xs else return xs

where almost sortspec (T) specifies an algorithm that almost-sorts a list, consuming at most T resources and final sortspec (T) specifies an algorithm that sorts an almost-sorted list, consuming at most T resources.

The program introsort leaves trivial lists unchanged and otherwise executes the first and second phase. Its resource usage is bounded by the sum of the first and second phase and some overhead for the subtraction, comparison, and if-then-else. Using the verification condition generator we prove that introsort is correct, i.e., refines the specification of sorting a slice:

introsort xs l h ≤ ⇓<sup>C</sup> Eis (slice sortspec xs l h (\$sort))

where Eis = 0(sort:=introsortcost) is the exchange rate used at this step and introsortcost = \$sub + \$if + \$lt + \$almost sort + \$f inal sort is the total allotted cost for introsort.

### **5.3 Quicksort Scheme**

The first phase can be implemented in the following way:

```
1 introsort aux μ xs l h =
```


where partitionspec partitions a slice into two non-empty partitions, returning the start index m of the second partition, and depthspec specifies '2 log(h − l)(.

Let us first analyze the recursive part: if the slice is shorter than the threshold τ , it is simply returned (line 15). Unless the recursion depth limit is reached, the slice is partitioned using h − l partition<sup>c</sup> coins, and the procedure is called recursively for both partitions (lines 10-14). Otherwise, the slice is sorted at a price of μ (h−l) sort<sup>c</sup> coins (line 8). The function μ here represents the leading term in the asymptotic costs of the used sorting algorithm, and the sort<sup>c</sup> coin can be seen as the constant factor. This currency will later be exchanged into the respective currencies that are used by the sorting algorithm. Note that we use currency sort<sup>c</sup> to describe costs per comparison of a sorting algorithm, while currency sort describes the cost for a whole sorting algorithm.

Showing that the procedure results in an almost-sorted list is straightforward. The running time analysis, however, is a bit more involved. We presume a function μ that maps the length of a slice to an upper bound on the abstract steps required for sorting the slice. We will later use heapsort with μnlogn n = n log n.

Consider the recursion tree of a call in introsort rec: We pessimistically assume that for every leaf in the recursion tree we need to call the fallback sorting algorithm. Furthermore, we have to partition at every inner node. This has cost linear in the length of the current slice. For each following inner level the lengths of the slices add up to the current one's, and so do the incurred costs. Finally we have some overhead at every level including the final one. The cost of the recursive part of introsort aux is:

 $introsort \text{r}\_{rec} \text{\_{cost}} \text{ \_{\(n,d\)}} = \ $\_{sort\_c} (\mu \ n) + \$ \_{partition\_c} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}} \text{\_{\(n + \)}}$ 

The correctness of the running time bound is proved by induction over the recursion of introsort rec. If the recursion limit is reached (d=0), the first summand pays for the fallback sorting algorithm. If d>0, part of the second summand pays for the partitioning of the current slice, then the list is split into two and the recursive costs are payed for by parts of all three summands. To bound the costs for the fallback sorting algorithm, μ needs to be superadditive: μ a + μ b ≤ μ (a+b). In both cases, the third summand pays for the overhead in the current call.

For d='2 log n( and an O(n log n) fallback sorting algorithm (μ=μnlogn), introsort reccost <sup>μ</sup>nlogn is in <sup>O</sup>(<sup>n</sup> log <sup>n</sup>).<sup>15</sup> In fact, any <sup>d</sup>∈O(log <sup>n</sup>) would do.

Before executing the recursive method, introsort aux calculates the depth limit d. The correctness theorem then reads:

introsort aux μnlogn xs l h ≤ ⇓<sup>C</sup> (Eisa(h−l))(almost sortspec xs l h \$almost sort)

with Eisa n=0(almost sort:= \$depth + introsort reccost μnlogn (n, '2 log n()).

Note that specifications typically use a single coin of a specific currency for their abstract operation, which is then exchanged for the actual costs, usually depending on the parameters.

This concludes the interesting part of the running time analysis of the first phase. It is now left to plug in an O(n log n) fallback sorting algorithm, and a linear partitioning algorithm.

Heapsort Independently of introsort, we have proved correctness and worst-case complexity of heapsort, yielding the following refinement lemma:

heapsort xs l h ≤ ⇓<sup>C</sup> (Ehs (h−l)) (slice sortspec xs l h (\$sort))

where Ehs n=0(sort:= c<sup>1</sup> + log n ∗ c<sup>2</sup> + n ∗ c<sup>3</sup> + (n ∗ log n) ∗ c4) for some constants c<sup>i</sup> :: ecost.

Assuming that <sup>n</sup> <sup>≥</sup> 2,<sup>16</sup> we can estimate <sup>E</sup>hs n sort <sup>≤</sup> <sup>μ</sup>nlogn <sup>n</sup> <sup>∗</sup> <sup>c</sup>, for <sup>c</sup> <sup>=</sup> c<sup>1</sup> + c<sup>2</sup> + c<sup>3</sup> + c4, and thus get, for Ehs-= 0(sort<sup>c</sup> := c):

⇓<sup>C</sup> (Ehs (h−l)) (slice sortspec xs l h (\$sort)) ≤ ⇓<sup>C</sup> Ehs-(slice sortspec xs l h (\$sort<sup>c</sup> (μnlogn (h−l))))

and, by, transitivity

heapsort xs l h ≤ ⇓<sup>C</sup> Ehs-(slice sortspec xs l h (\$sort<sup>c</sup> (μnlogn (h−l))))

Note that our framework allowed us to easily convert the abstract currency from a single operation-specific sort coin to a sort<sup>c</sup> coin for each comparison operation.

Partition and Depth Computation We implement partitioning with the Hoare partitioning scheme using the median-of-3 as the pivot element. Moreover, we implement the computation of the depth limit (2'log(h − l)() by a loop that counts how often we can divide by two until zero is reached. This yields the following refinement lemmas:

pivot partition xs l h ≤ ⇓<sup>C</sup> Epp (partitionspec xs l h (\$partition<sup>c</sup> (h−l))) calc depth l h ≤ ⇓<sup>C</sup> (Ecd (h−l)) (depthspec l h (\$depth))

<sup>15</sup> More precisely, the sum over all (finitely many) currencies is in O(n log n).

<sup>16</sup> Note that this is a valid assumption, as heapsort will never be called for trivial slices.

Combining the Refinements We replace slice sortspec, partitionspec and depthspec by their implementations heapsort, pivot partition and calc depth. We call the resulting implementation introsort aux2, and prove

introsort aux<sup>2</sup> xs l h ≤ ⇓<sup>C</sup> (Eaux (h−l)) (introsort aux μnlogn xs l h)

where the exchange rate Eaux combines the exchange rates Ehs- , Epp and Ecd for the component refinements.

Transitive combination with the correctness lemma for introsort aux then yields the correctness lemma for introsort aux2:

introsort aux<sup>2</sup> xs l h ≤ ⇓<sup>C</sup> (Eisa<sup>2</sup> (h−l)) (almost sortspec xs l h (\$almost sort))

where Eisa<sup>2</sup> n=0(almost sort:=↓<sup>C</sup> (Eaux n) (introsort auxcost n)) and the operation ↓<sup>C</sup> E t applies an exchange rate to a resource function.

Refining Resources The stepwise refinement approach allows to structure an algorithm verification in a way that correctness arguments can be conducted on a high level and implementation details can be added later. Resource currencies permit the same for the resource analysis of algorithms: they summarize compound costs, allow reasoning on a higher level of abstraction and can later be refined into fine-grained costs. For example, in the resource analysis of introsort aux the currencies sort<sup>c</sup> and partition<sup>c</sup> abstract the cost of the respective subroutines. The abstract resource argument is independent from their implementation details, which are only added in a subsequent refinement step, via the exchange rate Eaux.

### **5.4 Final Insertion Sort**

The second phase is implemented by insertion sort, repeatedly calling the subroutine insert. The specification of insert for an index i captures the intuition that it goes from a slice that is sorted up to index i−1 to one that is sorted up to index i. Insertion is implemented by moving the last element to the left, as long as the element left of it is greater (or the start of the list has been reached). Moving an element to its correct position takes at most τ steps, as after the first phase the list is almost sorted, i.e., any element is less than τ positions away from its final position in the sorted list. Moreover, elements originally at positions greater τ will never reach the beginning of the list, which allows for the unguarded optimization. It omits the bounds check for those elements, saving one index comparison in the innermost loop. Formalizing these arguments yields the implementation final insertion sort that satisfies

final insertion sort xs l h ≤ ⇓<sup>C</sup> (Ef is(h−l)) (final sortspec xs l h (\$f inal sort)) where Ef is n=0(final sort:=final insertioncost n), and final insertioncost n is linear in n.

Note that final insertion sort and introsort aux<sup>2</sup> use the same currency system. Plugging both refinements into introsort yields introsort<sup>2</sup> and the lemma

introsort<sup>2</sup> xs l h ≤ ⇓<sup>C</sup> (Eis2(h−l)) (introsort xs l h)

where the exchange rate Eis<sup>2</sup> combines the rates Eisa<sup>2</sup> and Ef is.

### **5.5 Separating Correctness and Complexity Proofs**

A crucial function in heapsort is sift down, which restores the heap property by moving the top element down in the heap. To implement this function, we first prove correct a version sift down1, which uses swap operations to move the element. In a next step, we refine this to sift down2, which saves the top element, then executes upward moves instead of swaps, and, after the last step, moves the saved top element to its final position. This optimization spares half of the memory accesses, exploiting the fact that the next swap operation will overwrite an element just written by the previous swap operation.

However, this refinement is not structural: it replaces swap operations by move operations, and adds an additional move operation at the end. At this point, we chose to separate the functional correctness and resource aspect, to avoid the complexity of a combined non-structural functional and currency refinement. It turns out that proving the complexity of the optimized version sift down<sup>2</sup> directly is straightforward. Thus, as sketched in Section 2.4, we first prove<sup>17</sup> sift down<sup>2</sup> <sup>≤</sup> sift down<sup>1</sup> <sup>≤</sup> sift downspec (∞), ignoring the resource aspect. Separately, we prove sift down<sup>2</sup> ≤<sup>n</sup> spec (λ . True) sift downcost, and combine the two statements to get sift down<sup>2</sup> ≤ sift downspec sift downcost.

### **5.6 Refining to LLVM**

The above abstract programs implicitly come with a fixed type and comparison operator for the elements of the list to be sorted. Those programs use abstract operations and currencies for arithmetic operations on indexes, control flow, comparisons and read/write of a random-access iterator (abstracted by lists with update and lookup operations).

When we further assume an LLVM program that refines the comparison operator in LLVM, and specify how the random-access data structure should be implemented — we choose arrays — we can automatically synthesize an LLVM program introsort impl that refines introsort2, i.e., satisfies the theorem:

$$hnr \ (array\_A \ p \ xs \ \star \ snat\_A \ l\_{\uparrow} \ l \ \star \ snat\_A \ h\_{\uparrow} \ h) \\ (introsort \ \star impl \ p \ l\_{\uparrow} \ h\_{\uparrow}) \\ (snat\_A \ l\_{\uparrow} \ l \ \star \ snat\_A \ h\_{\uparrow} \ h) \ array\_A \ (introsort\_2 \ xs \ l \ h)$$

Combination with the refinement lemmas for introsort<sup>2</sup> and introsort, followed by conversion to a Hoare triple, yields our final correctness statement:

l ≤ h ∧ h < length xs<sup>0</sup> =⇒ {\$(introsort implcost (h−l)) array<sup>A</sup> p xs<sup>0</sup> snat<sup>A</sup> l† l snat<sup>A</sup> h† h} introsort impl p l† h† {λr. ∃Axs. array<sup>A</sup> r xs ↑(slice sort aux xs<sup>0</sup> l h xs) snat<sup>A</sup> l† l snat<sup>A</sup> h† h} where introsort implcost :: nat → ecost is the cost bound obtained from applying

the exchange rates Eis and then Eis<sup>2</sup> to \$sort.

<sup>17</sup> Note that we have omitted the function parameters for better readability.

Note that this statement is independent of the Refinement Framework. Thus, to believe in its meaningfulness, one has to only check the formalization of Hoare triples, separation logic, and the LLVM semantics.

To formally prove the statement "introsort impl has complexity O(n log n)", we observe that introsort implcost uses only finitely many currencies, and only finitely many coins of each currency. We define the overall number of coins as

introsort implallcost n = Σc. introsort implcost n c

which expands to

introsort implallcost n = 4693 + 5 ∗ log n + 231 ∗ n + 455 ∗ (n ∗ log n)

which, in turn, is routinely proved to be in O(n log n).

As a last step, we instantiate the element type to 64-bit unsigned integers and the comparison operation to LLVM's icmp ult instruction, to obtain a program that sorts integers in ascending order. Our code generator can export this to actual LLVM text and a corresponding header file for interfacing our sorting algorithm from C or C++.

As LLVM does not support generics, we cannot implement a replacement for C++'s generic std::sort<T>. However, by repeating the last step for different types and compare operators, we can implement a replacement for any fixed T.

### **5.7 Benchmarks**

In this section we present benchmarks comparing the code extracted from our formalization with the real world implementation of introsort from the GNU C++ Library (libstdc++). Also, as a regression test, we compare with the code extracted from an earlier formalization of introsort [17] that did not verify the running time complexity and used an earlier iteration of the Sepref framework and LLVM semantics without time.

The results are shown in Figure 1. As expected, all three implementations have similar running times. Note that the small differences are well within the noise of the measurements. We conclude that adding the complexity proof to our introsort formalization, and the time aspect to our refinement process has not introduced any timing regressions in the generated code. Note, however, that the code generated by our current formalization is not identical to what the original formalization generated. This is mainly due to small changes in the formalization introduced when adding the timing aspect.

# **6 Conclusions**

We have presented a refinement framework for the simultaneous verification of functional correctness and complexity of algorithm implementations with competitive practical performance.

We use stepwise refinement to separate high-level algorithmic ideas from low-level optimizations, enabling convenient verification of highly optimized algorithms. The novel concept of resource currencies also allows structuring of the

**Fig. 1.** Comparison of the running time measured for the code generated by the formalization described in this paper (Isabelle-LLVM), the original formalization from [17] (notime), and the libstdc++ implementation. Arrays with 10<sup>8</sup> uint64s with various distributions were sorted, and we display the smallest time of 10 runs. The programs were compiled with clang-10 -O3, and run on an Intel XEON E5-2699 with 128GiB RAM and 256K/55M L2/L3 cache. See [17] for details of the benchmarking method.

complexity proofs along the refinement chain. Our framework refines down to the LLVM intermediate representation, such that we can use a state-of-the-art compiler to generate performant programs.

As a case-study, we have proved the functional correctness and complexity of the introsort sorting algorithm. Our verified implementation performs on par with the (unverified) state-of-the-art implementation from the GNU C++ Library. It also provably meets the C++11 standard library [7] specification for std::sort, which in particular requires a worst-case time complexity of O(n log n). We are not aware of any other verified real-world implementations of sorting algorithms that come with a complexity analysis.

Our work is a combination and substantial extension of an earlier refinement framework for functional correctness [15] which also comes with a verification of introsort [17], and a refinement framework for a single enat-valued currency [11]. In particular, we have generalized the refinement framework to arbitrary resources, introduced currencies that help organizing refinement proofs, extended the LLVM semantics and reasoning infrastructure with a cost model, connected it to the refinement framework via a new version of the Sepref tool, and, finally, added the complexity analysis for introsort.

### **6.1 Related Work**

Nipkow et al. [23, §4.1] collect verification efforts concerning sorting algorithms. We add a few instances verifying running time: Wang et al. use TiML [25] to verify correctness and asymptotic time complexity of mergesort automatically.

Zhan and Haslbeck [26] verify functional correctness and asymptotic running time analysis of imperative versions of insertion sort and mergesort. We build on earlier work by Lammich [17] and provide the first verification of functional correctness and asymptotic running time analysis of heapsort and introsort.

The idea to generalize the nres monad [19] to resource types originates from Carbonneaux et al. [4]. They use potential functions (state → enat) instead of predicates (state → bool), present a quantitative Hoare logic and extend the CompCert compiler to preserve properties of stack-usage from programs in Clight to compiled programs.

We see our paper in the line of research concerning simultaneously verifying functional correctness and worst-case time complexity of algorithms. Atkey [1] pioneered resource analysis with separation logic, Gu´eneau et al. [9] present a framework that uses time credits in Coq and apply it to involved algorithms and data structures [10,6]. We further develop their work in three ways: First, while time credits usually are natural numbers [1,9,26,21,6] or integers [10], we generalize to an abstract resource type and specifically use resource currencies for a fine-grained analysis. Second, we use stepwise refinement to structure the verification and make the resource analysis of larger use-cases manageable. Third, we provide facilities to automatically extract efficient competitive code from the verification. The following are the most complex algorithms and data structures with verified running time analysis using time credits and separation logic we are aware of: a linear time selection algorithm [26], an incremental cycle detection algorithm [10], Union-Find [6], Edmonds-Karp and Kruskal's algorithm [11].

### **6.2 Future Work**

A verified compiler down to machine code would further reduce the trusted code base of our approach. While that is not expected to be available soon for LLVM in Isabelle, the NREST-monad and the Sepref tool are general enough to connect to a different back end. Formalizing one of the CompCert C semantics [2] in Isabelle, connecting it to the NREST-monad and then processing synthesized C code with CompCert's verified compiler would be a way to go.

In this paper we apply our framework to verify an involved algorithm that only uses basic data structures, i.e. arrays. A next step is to verify more involved data structures, e.g. by porting existing verifications of the Imperative Collections Framework [16] to LLVM. We do not yet see how to reason about the running time of data structures like hash maps, where worst-case analysis would be possible but not useful. In general, extending the framework to average-case analysis and probabilistic programs are exciting roads to take.

We plan to implement more automation, saving the user from writing boilerplate code when handling resource currencies and exchange rates.

Neither the LLVM nor the NREST level of our framework is tied to running time. Applying it to other resources like maximum heap space consumption might be a next step.

# **References**


26. Zhan, B., Haslbeck, M.P.L.: Verifying asymptotic time complexity of imperative programs in Isabelle. In: Galmiche, D., Schulz, S., Sebastiani, R. (eds.) Automated Reasoning - 9th International Joint Conference, IJ-CAR 2018. Lecture Notes in Computer Science, vol. 10900, pp. 532–548. Springer (2018). https://doi.org/10.1007/978-3-319-94205-6 35, https://doi. org/10.1007/978-3-319-94205-6\_35

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Run-time Complexity Bounds Using Squeezers

Oren Ish-Shalom<sup>1</sup> , Shachar Itzhaky<sup>2</sup>, Noam Rinetzky<sup>1</sup>, and Sharon Shoham<sup>1</sup>

<sup>1</sup> Tel Aviv University, Tel Aviv, Israel tuna.is.good.for.you@gmail.com <sup>2</sup> Technion, Haifa, Israel

Abstract. Determining upper bounds on the time complexity of a program is a fundamental problem with a variety of applications, such as performance debugging, resource certification, and compile-time optimizations. Automated techniques for cost analysis excel at bounding the resource complexity of programs that use integer values and linear arithmetic. Unfortunately, they fall short when execution traces become more involved, esp. when data dependencies may affect the termination conditions of loops. In such cases, state-of-the-art analyzers have shown to produce loose bounds, or even no bound at all.

We propose a novel technique that generalizes the common notion of recurrence relations based on ranking functions. Existing methods usually unfold one loop iteration, and examine the resulting relations between variables. These relations assist in establishing a recurrence that bounds the number of loop iterations. We propose a different approach, where we derive recurrences by comparing *whole traces* with *whole traces* of a lower rank, avoiding the need to analyze the complexity of intermediate states. We offer a set of global properties, defined with respect to whole traces, that facilitate such a comparison, and show that these properties can be checked efficiently using a handful of local conditions. To this end, we adapt *state squeezers*, an induction mechanism previously used for verifying safety properties. We demonstrate that this technique encompasses the reasoning power of bounded unfolding, and more. We present some seemingly innocuous, yet intricate, examples where previous tools based on *cost relations* and control flow analysis fail to solve, and that our squeezer-powered approach succeeds.

# 1 Introduction

Cost analysis is the problem of estimating the resource usage of a given program, over all of its possible executions. It complements functional verification—of safety and liveness properties—and is an important task in formal software certification. When used in combination with functional verification, cost analysis ensures that a program is not only correct, but completes its processing in a reasonable amount of time, uses a reasonable amount of memory, communication bandwidth, etc. In this work we focus on run-time complexity analysis. While the area has been studied extensively, e.g., [19], [28], [3], [14], [6], [16], [21], [12], [9], the general problem of constraining the number of iterations in programs containing loops with arbitrary termination conditions remains hard.

A prominent approach to computing upper bounds on the time complexity of a program identifies a well-founded numerical measure over program states that decreases in

N. Yoshida (Ed.): ESOP 2021, LNCS 12648, pp. 320–347, 2021. https://doi.org/10.1007/978-3-030-72019-3\_12

```
void binary_counter(unsigned int n) {
 unsigned int c[n];
 memset(c,0,n*sizeof(unsigned int));
 int i=0;
 while (i < n) {
   if (c[i] == 1) /*scan 1-prefix*/{c[i] = 0; i++; }
   else /*increment*/ {c[i] = 1; i=0; print(c);}
 }}
```
Fig. 1. A program that produces all combinations of n bits.

every step of the program, also called a *ranking function*. In this case, an upper bound on the measure of the initial states comprises an upper bound on the program's time complexity. Finding such measures manually is often extremely difficult. The *cost relations* approach, dating back to [28], attempts to automate this process by using the control flow graph of the program to extract recurrence formulas that characterize this measure. Roughly speaking, the recurrences relate the measures (costs) of adjacent nodes in the graph, taking into account the cost of the step between them. In this way, the cost relations track the evolution of the measure between *every* pair of consecutive states along the executions of the program.

One limitation of cost relations is the need to capture the number of steps remaining for execution in *every* state, that is, all intermediate states along all executions. If the structure of the state is complex, this may require higher order expressions, e.g., summing over an unbounded number of elements. As an example, consider the program in Fig. 1 that implements a binary counter represented by an array of bits.

In this case, a ranking function that decreases between every two consecutive iterations of the loop, or even between two iterations that print the value of the counter, depends on the *entire* content of the array. Attempting to express a ranking function over the scalar variables of this program is analogous to abstracting the loop as a finitestate system that ignores the content of the array, and as such contains transition cycles (e.g. the abstract state n → n0, i → 0, obtained by projecting the state to the scalar variables only, repeats multiple times in any trace)—meaning that no strictly decreasing function can be defined in this way. Similarly, any attempt to consider a bounded number of bits will encounter the same difficulty.

In this paper, we propose a novel approach for extracting recurrence relations capturing the time complexity of an imperative program, modeled as a transition system, by relating whole traces instead of individual states. The key idea is to relate a trace to (one or more) shorter traces. This allows to formulate a recurrence that resolves to the length of the trace and recurs over the values at the initial states only. We sidestep the need to take into account the more complex parts of the state that change along the trace (e.g., in the case of the binary counter, the array is initialized with zeros).

Our approach relies on the notion of *state squeezers* [22], previously used exclusively for the verification of safety properties. We present a novel aspect where the same squeezers can be used to determine complexity bounds, by replacing the safety property check with trace length judgements.

Squeezers provide a means to perform induction on the "size" of (initial) states to prove that all reachable states adhere to a given specification. This is accomplished by attaching *ranks* from a well-founded set to states, and defining a *squeezer function* that maps states to states of a lower rank. Note that the notion of a rank used in our work is distinct from that of a ranking function, and the two should not be confused; in particular, a rank is not required to decrease on execution steps. Previously, squeezers were utilized for safety verification: the ability to establish safety is achieved by having the squeezer map states in a way that forms a (relaxed form of) a *simulation relation*, ensuring that the traces of the lower-rank states simulate the traces of the higher rank states. Due to the simulation property, which is verified locally, safety over states with a *base* rank, carries over (by induction over the rank) to states of any higher rank.

In this work, we use the construction of well-founded ranks and squeezers to define a *recurrence formula* representing (an upper bound on) the time complexity of the procedure being analyzed. We do so by expressing the complexity (length) of traces in terms of the complexity of lower-rank traces. This new setting raises additional challenges: it is no longer sufficient to relate traces to lower-rank traces; we also need to *quantify the discrepancy* between the lengths of the traces, as well as between their ranks. This is achieved by a certain form of simulation that is parameterized by *stuttering shapes* (for the lengths) and by means of a *rank bounding function* (for the ranks). Furthermore, while [22] limits each trace to relate to a *single* lower-rank trace, we have found that it is sometimes beneficial to employ a *decomposition* of the original trace into *several* consecutive *trace segments*, so that each segment corresponds to *some* (possibly different) lower-rank trace.The segmentation simplifies the analysis of the length of the entire trace, since it creates sub-analyses that are easier to carry out, and the sum of which gives the desired recurrence formula. This also enables a richer set of recurrences to be constructed automatically, namely non-single recurrences (meaning that the recursive reference may appear more than once on the right hand side of the equation).

The base case of the recurrence is obtained by computing an upper bound on the time complexity of base-rank states. This is typically a simpler problem that may be addressed, e.g., by symbolic execution due to the bounded nature of the base. The solution to the recurrence formula with the respective base case soundly overapproximates the time complexity of the procedure.

We show that, conceptually, the classical approach for generating recurrences based on ranking functions can be viewed as a special case of our approach where the squeezer maps a state to its immediate successor. The real power of our approach is in the freedom to define other squeezers, producing simpler recursions, and avoiding the need for complex ranking functions.

Our use of squeezers for extracting recurrences that bound the complexity of imperative programs is related to the way analyses for functional programs (e.g. [20]) use the term(s) in recursive function calls to extract recurrences. The functional programming style coincidentally provides such candidate terms. The novelty of our approach is in introducing the concept of a squeezer explicitly, leading to a more flexible analysis as it does not restrict the squeezer to follow specific terms in the program. In particular, this allows reasoning over space in imperative programs as well.

The main results of this paper can be summarized as follows:


# 2 Overview

In this section we give a high level description of our technique for complexity analysis using the binary counter example in Fig. 1.

*Example: Binary counter* The procedure in Fig. 1 receives as an input a number n of bits and iterates over all their possible values in the range <sup>0</sup>...2<sup>n</sup> <sup>−</sup> <sup>1</sup>. The "current" value is maintained in an array c which is initialized to zero and whose length is n. c[0] represents the least significant bit. The loop scans the array from the least significant bit forward looking for the leftmost 0 and zeroing the prefix of 1s. As soon as it encounters a 0, it sets it to 1 and starts the scan from the beginning. The program terminates when it reaches the end of the array (i = n), all array entries are zeros, and the last value was 111 ...; at this point all the values have been enumerated.

*Existing analyses* All recent methods that we are aware of (such as [16,4,20]) fail to analyze the complexity of this procedure (in fact, most methods will fail to realize that the loop terminates at all). One reason for that is the need to model the contents of the array whose size in unknown at compile time. However, even if data *were* modeled somehow and taken into account, finding a ranking function, which underlies existing approaches, is hard since this function is required to decrease between *any* two consecutive iterations along *any* execution. Here for instance, to the best of our knowledge, such a function would depend on an unbounded number of elements of the array; it would need to extract the current value as an integer, along the lines of <sup>n</sup>−<sup>1</sup> <sup>j</sup>=0 <sup>c</sup>[j] · <sup>2</sup><sup>j</sup> .

The use of a ranking function for complexity analysis is somewhat analogous to the use of inductive invariants in safety verification. Both are based on induction over time along an execution. This paper is inspired by previous work [22] showing that verification can also be done when the induction is performed on the size (*rank*) of the state rather than on the number of iterations, where the size of the state may correspond, e.g., to the size of an unbounded data structure. We argue that similar concepts can be applied in a framework for complexity classification. That is, we try to infer a recurrence relation that is *based on the rank* of the state and correlates the lengths of *complete* executions—executions that start from an initial state—of different ranks. This sidesteps the need to express the length of *partial* executions, which start from intermediate states. While the approach applies to bounded-state systems as well, its benefits become most apparent when the program contains a-priori unbounded stores, such as arrays.

*Our approach.* Roughly speaking, our approach for computing recurrence formulas that provide an upper bound on the complexity of a procedure is based on the following ingredients:


All of these ingredients are synthesized automatically, as we discuss in Section 4. Next, we elaborate on each of these ingredients, and illustrate them using the binary counter example. We further demonstrate how we use these ingredients to find recurrence formulas describing (an upper bound on) the complexity of the program.

*Some notations* We adopt a standard encoding of a program as a transition system over a state space Σ, with a set of initial states init ⊆ Σ and transition function tr : Σ → Σ, where a transition corresponds to a loop iteration. We use reach ⊆ Σ to denote the set of reachable states, reach <sup>=</sup> {<sup>σ</sup> | ∃σ0, k. tr <sup>k</sup>(σ0) = <sup>σ</sup> <sup>∧</sup> <sup>σ</sup><sup>0</sup> <sup>∈</sup> init}.

*Defining the rank of a state* Ranks are taken from a well founded set (X, ≺) with a basis B ⊆ X that contains all the minimal elements of X. The rank function, r : init → X, aims to abstract away irrelevant data from the (initial) state that does *not* effect the execution time, and only uses state "features" that do. When proper ranks are used, the rank of an initial state is all that is needed to provide a tight bound on its trace length. Since ranks are taken from a well founded set, they can be recursed over. In the binary counter example, the chosen rank is n, namely, the rank function maps each state to the size of the array. (Notice that the rank does not depend on the contents of the array; in contrast, bounding the trace length from any intermediate state, and not just initial states, would have required considering the content of the array).

Given the rank function, our analysis extracts a recurrence formula for the complexity function comp<sup>x</sup> : <sup>X</sup> <sup>→</sup> <sup>N</sup> ∪ {∞} that provides an upper bound on the number of iterations of tr based on the rank of the *initial states*. In our exposition, we sometimes

Fig. 2. Correspondence between two traces of the binary counter program. Squeezer removes the leftmost array entry, that represents the least significant bit. The rank is the array size, i.e., four on the upper trace and three on the lower one. The simulation includes only 1-,2- and 3-steps, so the length of the upper trace is at most three times that of the lower trace, yielding an overall complexity bound of O(3<sup>n</sup>).

also refer to a time complexity function over states, comp<sup>s</sup> : init <sup>→</sup> <sup>N</sup> ∪ {∞}, which is defined directly on the (initial) states, as the number of iterations in an execution that starts with some σ<sup>0</sup> ∈ init.

*Defining a squeezer* The squeezer  : <sup>Σ</sup> <sup>→</sup> <sup>Σ</sup> is a function that maps states to states of lower-rank traces (where the rank of a trace is determined by the rank of its initial state), down to the base ranks B. Its importance is in defining a correspondence between higher-rank traces and lower-rank ones that can be verified locally, by examining individual states rather than full traces. The kind of correspondence that the squeezer is required to ensure affects the flexibility of the approach and the kind of recurrence formulas that it may yield. To start off, consider a rather naive squeezer that satisfies the following local properties:

	- initial anchor: <sup>σ</sup><sup>0</sup> <sup>∈</sup> init <sup>⇒</sup> (σ0) <sup>∈</sup> init,
	- <sup>k</sup>-step: <sup>σ</sup> <sup>∈</sup> reach ⇒ ∃k. tr ((σ)) = (tr<sup>k</sup>(σ)).

As an example, the squeezer we consider for the binary counter program is rather intuitive: it removes the least significant bit (c[0]), and adjusts the index i accordingly. Doing so yields a state with rank <sup>r</sup> ((σ0)) = <sup>r</sup> (σ0) <sup>−</sup> <sup>1</sup>. Fig. 2 shows the correspondence between a 4-bit binary counter, and a 3-bit one. The figure illustrates the simulation k-step property for k = 1, 2, 3: σ<sup>0</sup> and σ<sup>3</sup> are (3, 1)-stuttering, σ<sup>1</sup> and σ<sup>4</sup> are (2, 1)-stuttering, and σ2, σ<sup>5</sup> and σ<sup>6</sup> are (1, 1)-stuttering.

The simulation property induces a correlation between a higher rank trace τ and a lower rank one τ , such that every step of τ is matched by k steps in τ . Whenever a state σ satisfies the k-step property, we will refer to it as being (k, 1)-*stuttering*. (We usually only care about the smallest k that satisfies the property for a given σ.) Now suppose that there exists some <sup>k</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup> such that for every trace <sup>τ</sup> (σ0) and every state <sup>σ</sup> <sup>∈</sup> <sup>τ</sup> (σ0), <sup>σ</sup> is (k, 1)-stuttering with <sup>1</sup> <sup>≤</sup> <sup>k</sup> <sup>≤</sup> k. This would yield the following complexity bound:

$$comp\_s(\sigma\_0) \le \hat{k} \cdot comp\_s(\vee(\sigma\_0)).\tag{1}$$

*All your base* <sup>3</sup> What should happen if we repeatedly apply  to some initial state <sup>σ</sup>0, each time obtaining a new, lower-rank trace? Since <sup>r</sup> ((σ0)) <sup>≺</sup> <sup>r</sup> (σ0), and since (X, ≺) is well-founded, we will eventually hit some state of *base rank*:

$$\gamma(\gamma(\dots(\sigma\_0))\dots) = \sigma\_0^\diamond \quad \text{such that} \quad r(\sigma\_0^\diamond) \in B^\times$$

Hence, if we know the complexity of the initial states with a base rank, we can apply Eq. (1) iteratively to compute an upper bound of the complexity of *any* initial state.

How many steps will be needed to get from an arbitrary initial state σ<sup>0</sup> to σ◦ 0? Clearly, this depends on the rank, and the way in which  decreases it.

Consider the binary counter program again, with the rank r (σ) = n. (N, <) is well-founded, with a single minimum 0. If we define, e.g., B = {0, 1}, we know that the length of any trace with n ∈ B is bounded by a constant, 2. (Bounding the length of traces starting from an initial state σ<sup>0</sup> where r (σ0) ∈ B can be done with known methods, e.g., symbolic execution). Since the rank decreases by 1 on each "squeeze", we get the following exponential bound:

$$comp\_s(\sigma\_0) \le 2 \cdot 3^{n-1} = O(3^n) \tag{2}$$

The last logical step, going from (1) to (2), is, in fact, highly involved: since Eq. (1) is a mapping of *states*, solving such a recurrence for arbitrary  cannot be carried out using known automated methods. Instead, we implicitly used the rank of the state, n, to extract a recurrence over scalar values and obtain a closed-form expression. Let us make this reasoning explicit by first expressing Eq. (1) in terms of comp<sup>x</sup> instead of comps:

$$comp\_x(n) \le \hat{k} \cdot comp\_x(n-1)$$

Here, n − 1 denotes the rank obtained when squeezing an initial state of rank n. Unlike Eq. (1), this is a recurrence formula over (N, <) that may be solved algorithmically, leading to the solution compx(n) = O(3<sup>n</sup>).

*Surplus analysis* Assuming the worst k for all the states in the trace can be too conservative; in particular, if there are only a few states that satisfy the k-step property, and all the others satisfy the 1-step property. In the latter case, if we know that at most b states in any one trace have k > 1, we can formulate the tighter bound:

$$comp\_s(\sigma\_0) \le comp\_s(\uparrow(\sigma\_0)) + \widehat{k} \cdot b \tag{3}$$

Incidentally, in the current setting of the binary counter program, the number of ksteps (3-steps) is *not* bounded. So we cannot apply the inequality (3) repeatedly on any trace, as the number of 3-steps depends on the initial state. However, we can improve the analysis by partitioning the trace to two parts, as we explain next.

<sup>3</sup> https://knowyourmeme.com/memes/all-your-base-are-belong-to-us

*Segments and mini-traces* Note that both (1) and (3) "suffer" from an inherent restriction that the right hand side contains *exactly* one recursive reference. As such, they are limited in expressing certain kinds of complexity classes.

In order to get more diverse recurrences, including non-single recurrences, we propose an extension of the simulation property that allows more than one lower-rank trace:

– *partitioned* simulation


This definition allows a new mini-trace to start at any point along a higher-rank trace τ , thus marking the beginning of a new segment of τ . When this occurs, we call tr (σ) a *switch state*. For the sake of uniformity, we also refer to all initial states σ<sup>0</sup> ∈ init as switch states. Hence, each segment of τ starts with a switch state, and the mini-traces are the lower-level traces that correspond to the segments (these are the traces that start from (σs), where σ<sup>s</sup> is a switch state). The length of τ can now be expressed as the *sum* of lower-level mini-traces.

However, there are two problems remaining. First, we need to extend the "rank decrease of non-base initial states" requirement to any switch state in order to ensure that the ranks of all mini-traces are indeed lower. Namely, we need to require that if σ<sup>s</sup> is any switch state in a trace from σ0, then r (σs) ≺ r(σ0). Second, even if we extend the rank decrease requirement, this definition does not suggest a way to bound the number of correlated mini-traces and their respective ranks, and therefore suggests no effective way to produce an equation for comp<sup>s</sup> as before.

To sidestep the problem of a potentially unbounded number of mini-traces, we augment the definition of simulation with a *trace partition* function; to address the challenge of the rank decrease we use a *rank-bounding* function, which is responsible both for ensuring that the rank of the mini-traces decreases and for bounding their ranks.

*Defining a partition* We define a function p<sup>d</sup> : Σ → {1,...,d}, parameterized by a constant d, called a *partition function*, that is weakly monotone along any trace (pd(σ) ≤ pd(tr (σ))). This function induces a partition of any trace τ into (at most) d segments by grouping states based on the value of pd(σ). To ensure the segments and mini-traces are aligned, we require that switch states only occur at segment boundaries.

– d-*partitioned* simulation:


In our running example, let us change  so that it shrinks the state by removing the *most* significant bit instead of the least. This leads to a partition of the execution trace for r (σ0) = n into two segments, as shown in Fig. 3. The partition function is p<sup>d</sup> = (i ≥ n || c[n − 1]) ? 2 : 1 (essentially, c[n − 1] + 1, except that the final state is slightly different). As can be seen from the figure, each segment simulates a mini-trace

Fig. 3. An execution trace of the binary counter program that corresponds to two mini-traces of lower rank.

of rank n − 1, with k = 1 for all the steps except for the last step (at σ28) where k = 2. In this case, it would be folly to use the recurrence (1) with <sup>k</sup> = 2, since all the steps are 1:1 except one. Instead, we can formulate a tighter bound:

$$\operatorname{comp}\_s(\sigma\_0) \le \operatorname{comp}\_s(\sigma\_0') + \operatorname{comp}\_s(\sigma\_0'') + 2$$

Where: comps(σ <sup>0</sup>), comps(σ <sup>0</sup> ) are the lengths of the mini-traces, and 2 is the surplus from the switch transition σ<sup>14</sup> → σ<sup>15</sup> plus the 2-step at σ28. In the case of this program, we know that r (σ <sup>0</sup>) = r (σ <sup>0</sup> ) = r (σ0)−1, for any initial state σ0, therefore, turning to compx, we can derive and solve the recurrence compx(n)=2·compx(n−1)+2, which together with the base yields the bound:

$$comp\_x(n) = 2^{n+1} - 2$$

Clearly, a general condition is required in order to identify the ranks of the corresponding initial states of the (lower-rank) mini-traces (and at the same time, ensure that they decrease).

*Bounding the ranks of squeezed switch states* This is not a trivial task, since as previously noted, the squeezed ranks could be different, and may depend on properties present in the corresponding switch states. To achieve this goal, once a partition function <sup>p</sup><sup>d</sup> is defined, we also define a rank-bounding function <sup>ˆ</sup> : <sup>X</sup> × {1,...,d} → <sup>X</sup>, where for any <sup>σ</sup><sup>0</sup> <sup>∈</sup> init and switch state <sup>σ</sup>s, <sup>ˆ</sup> provides a bound for the rank of (σs) based on that of σ0:

$$r(\lor(\sigma\_s)) \preceq \hat{\lor} \left( r(\sigma\_0), p\_d(\sigma\_s) \right) \prec r(\sigma\_0) \tag{4}$$

The rightmost inequality ensures that a mini-trace that starts from (σs) is of lowerrank than σ0, and as such extends the "rank decrease" requirement to all mini-traces. Based on this restriction, we can formulate a recurrence for comp<sup>x</sup> based on the initial rank ρ = r (σ0), as follows:

$$comp\_x(\rho) \le \sum\_{i=1}^d comp\_x\left(\hat{\,}^\circ(\rho, i)\right) + (d-1) + \hat{k} \cdot b \tag{5}$$

Where <sup>b</sup>, as before, is the number of <sup>k</sup>-steps for which k > <sup>1</sup>, and <sup>k</sup> is the bound on <sup>k</sup> (<sup>k</sup> <sup>≤</sup> k). The expression (<sup>d</sup> <sup>−</sup> 1) represents the transitions between segments, and <sup>k</sup> · <sup>b</sup> represents the surplus of the <sup>ρ</sup>-rank trace over the total lengths of the mini-traces.

It should be clear from the definition above, that ˆ is quite intricate. How would we compute it effectively? The rank decrease of the initial states and the simulation properties were *local* by nature, and thus amenable to validation with an SMT solver. The ˆ function is inherently *global*, defined w.r.t. an entire trace. This makes the property (4) challenging for verification methods based on SMT. To render this check more feasible with first-order reasoning, we introduce two special cases where the problem of checking (4) becomes easier: rank preservation and a single segment, explained next.

*Taming* ˆ *with rank preservation* To obtain rank preservation, we extend the rank function to all states (instead of just the initial states), and require that the rank is preserved along transitions. This is appropriate in some of the scenarios we encountered. For example, the binary counter illustration satisfies the property that along any execution {σi}<sup>∞</sup> <sup>i</sup>=0, the rank is preserved: r(σi) = r(σ<sup>i</sup>+1). Rank preservation means that given a switch state σ<sup>s</sup> of an arbitrary segment i, we know that r(σs) = r(σ0). Once this is set, ˆ only needs to overapproximate the rank of (σs) in terms of the rank of the same state σ.

*Taming* ˆ *with a single segment* In this case, checking (4) reduces to a single check of the initial state, which is the only switch state. It turns out that the restriction to a single segment is still expressive enough to handle many loop types.

*Putting it all together* Theoretically, r , , pd, and ˆ can be manually written by the user. However, this is a rather tedious task, that is straightforward enough to be automated. We observed that all the aforementioned functions are simple enough entities, that can be expressed through a strict syntax using first order logic. Similar to [22], we apply a generate-and-test synthesis procedure to enumerate a space of possible expressions representing them. This process is explained in Section 4.

# 3 Complexity Analysis based on Squeezers

In this section we develop the formal foundations of our approach for extracting recurrence relations describing the time complexity of an imperative program based on state squeezers. We present the ingredients that underly the approach, the conditions they are required to satisfy, and the recurrence relations they induce. In the next section, we explain how to extract the recurrences automatically. Given the recurrence relation, a dedicated (external) tool may be applied to end up with a closed formula, similar to [3].

We use *transition systems* to capture the semantics of a program.

Definition 1 (Transition Systems). *A transition system is a tuple* (Σ, init, tr )*, where* Σ *is a set of* states*,* init ⊆ Σ *is a set of* initial states *and* tr : Σ → Σ *is a* transition function *(rather than a transition relation, since only deterministic procedures are* *considered). The set of* terminal states F ⊆ Σ *is implicitly defined by* tr (σ) = σ*. An* execution trace *(or a* trace *in short) is a finite or infinite sequence of states* τ = σ0, σ1,... *such that* σ<sup>i</sup>+1 = tr (σi) *for every* 0 ≤ i < |τ |*. A state* σ ∈ Σ *defines an execution trace* <sup>τ</sup> (σ) = {tr <sup>i</sup> (σ)}<sup>i</sup>∈<sup>N</sup>*. Whenever there exists an index* 0 ≤ k ≤ |τ | *s.t.* <sup>σ</sup><sup>k</sup> <sup>∈</sup> <sup>F</sup>*, we truncate* <sup>τ</sup> (σ) *into a finite trace* {tr <sup>i</sup> (σ)}<sup>k</sup> <sup>i</sup>=0*, where* k *is the minimal such index. The trace is* initial *if it starts from an initial state, i.e.,* σ ∈ init*. Unless explicitly stated otherwise, all traces we consider are initial. The set of* reachable states *is* reach = {σ ∈ Σ | ∃σ<sup>0</sup> ∈ init . σ ∈ τ (σ0)}*.*

Roughly, to represent a program by a transition system, we translate it into a single loop program, where init consists of the states encountered when entering the loop, and transitions correspond to iterations of the loop.

In the sequel, we fix a transition system (Σ, init, tr ) with a set F of terminal states and a set reach of reachable states.

Definition 2 (Complexity over states). *For a state* σ ∈ Σ*, we denote by* comps(σ) *the number of transitions from* σ *to a terminal state along* τ (σ) *(the trace that starts from* σ*). Formally, if* τ (σ) *does not include a terminal state, i.e., the procedure does* not *terminate from* σ*, then* comps(σ) = ∞*. Otherwise:*

$$comp\_s(\sigma) = \min\{k \in \mathbb{N} \mid tr^k(\sigma) \in F\}.$$

*The complexity function of the program maps each initial state* σ<sup>0</sup> ∈ init *to its time complexity* comps(σ0) <sup>∈</sup> <sup>N</sup> ∪ {∞}*.*

Our complexity analysis derives a recurrence relation for the complexity function by expressing the length of a trace in terms of the lengths of traces that start from lower rank states. This is achieved by (i) attaching to each initial state a *rank* from a wellfounded set that we use as the argument of the complexity function and that we recur over, and (ii) defining a *squeezer* that maps each state from the original trace to a state in a lower-rank trace; the mapping forms a *partitioned simulation* according to a *partition function* that decomposes a trace to segments; each segment is simulated by a (separate) lower-rank trace, allowing to express the length of the former in terms of the latter, and finally, (iii) defining a *rank bounding function* that expresses (an upper bound on) the ranks of the lower-rank traces in terms of the rank of the higher-rank trace. We elaborate on these components next.

### 3.1 Time complexity as a function of rank

We start by defining a rank function that allows us to express the time complexity of an initial state by means of its rank.

Definition 3 (Rank). *Let* X *be a set, and* ≺ *be a well-founded partial order over* X*. Let* B ⊇ min(X) *be a* base *for* X*, where* min(X) *is the set of all the minimal elements of* X *w.r.t.* ≺*. A* rank function r : init → X *maps each initial state to a rank in* X*. We extend the notion of a rank to initial traces as follows. Given an initial trace* τ = τ (σ0)*, we define its rank to be the rank of* σ0*. We refer to states* σ<sup>0</sup> *such that* r(σ0) ∈ B *as the* base states*. Similarly, (initial) traces whose ranks are in* B *are called* base traces*.*

In our analysis, ranks range over <sup>X</sup> <sup>=</sup> <sup>N</sup><sup>m</sup> (for some <sup>m</sup> <sup>∈</sup> <sup>N</sup>+), with <sup>≺</sup> defined by the lexicographic order. Ranks let us abstract away data inside the initial execution states which does *not* affect the worst-case bound on the trace length. For example, the length of traces of the binary counter program (Fig. 1) is completely agnostic to the actual content of the array at the initial state. The only parameter that affects its trace length is the array size, and not which integers are stored inside it. Hence, a suitable rank function in this example maps an initial state to its array length. This is despite the fact that the execution does depend on the content of the array, and, in particular, the number of remaining iterations from an intermediate state within the execution depends on it. The partial order ≺ and the base set B will be used to define the recurrence formula as we explain in the sequel.

We will assume from now on that (X, ≺, B), as well as the rank function, are fixed, and can be understood from context. The rank function r induces a complexity function comp<sup>x</sup> : <sup>X</sup> <sup>→</sup> <sup>N</sup> ∪ {∞} over ranks, defined as follows.

Definition 4 (Complexity over ranks). *The complexity function over ranks,* comp<sup>x</sup> : <sup>X</sup> <sup>→</sup> <sup>N</sup> ∪ {∞}*, is defined by:*

$$comp\_x(\rho) = \max\{comp\_s(\sigma\_0) \mid r(\sigma\_0) \preceq \rho \land \sigma\_0 \in init\}$$

The definition ensures that for every initial state σ<sup>0</sup> ∈ init, we can compute (an upper bound on) its time complexity based on its rank, as follows: comps(σ0) ≤ compx(r(σ0)). The complexity of ρ takes into account all states with r (σ) 2 ρ and not only those with rank exactly ρ, to ensure monotonicity of comp<sup>x</sup> in the rank (i.e., if ρ<sup>1</sup> 2 ρ<sup>2</sup> then compx(ρ1) ≤ compx(ρ2)). Our approach is targeted at extracting a recurrence relation for compx.

### 3.2 Complexity decomposition by partitioned simulation

In order to express the length of a trace in terms of the lengths of traces of lower ranks, we use a *squeezer* that maps states from the original trace to states of lower-rank traces and (implicitly) induces a correspondence between the original trace and the lower-rank trace(s). For now, we do not require the squeezer to decrease the rank of the trace; this requirement will be added later. The squeezer is accompanied by a partition function to form a *partitioned simulation* that allows a single higher-rank trace to be matched to multiple lower-rank traces such that their lengths may be correlated.

Definition 5 (Squeezer, ). *A squeezer is a function*  : <sup>Σ</sup> <sup>→</sup> <sup>Σ</sup>*.*

Definition 6. *A function* <sup>p</sup><sup>d</sup> : <sup>Σ</sup> → {1,...,d}*, where* <sup>d</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup> *is called a* <sup>d</sup>-partition function *if for* every *trace* τ = σ0, σ1,... *it holds that* pd(σ<sup>i</sup>+1) ≥ pd(σi) *for every* 0 ≤ i < |τ |*.*

The partition function partitions a trace into a bounded number of *segments*, where each segment consists of states with the same value of pd. We refer to the first state of a segment as a *switch state*, and to the last state of a finite segment as a *last state* (note that if τ is infinite, its last segment has no last state). In particular, this means that the initial state of a trace is a switch state. (Note that a state may be a switch state in one trace but not in another, while a last state is a last state in any trace, as long as the same partition function is considered.)

Our complexity analysis requires the squeezer to form a partitioned simulation with respect to pd. Roughly, this means that the squeezer maps each segment of a trace to a (lower-rank) trace that "simulates" it. To this end, we require *all* the states σ within a segment of a trace to be (h, )-"stuttering", for some h ≥ ≥ 1. Stuttering lets h consecutive transitions of σ be matched to consecutive transitions of its squeezed counterpart. If h = , the state σ contributes to the complexity the same number of steps as the squeezed state. Otherwise, σ contributes h − additional steps, resulting in a longer trace. Recall that terminal states also have outgoing transitions (to themselves), however these transitions do not capture actual steps; they do not contribute to the complexity. Hence, stuttering also requires that "real" transitions of σ are matched to "real" transitions of its squeezed counterpart, namely, if the latter encounter a terminal state, so must the former. For the last states of segments the requirement is slightly different as the simulation ends at the last state, and a new simulation begins in the next segment. In order to account for the transition from the last state of one segment to the first (switch) state of the next segment, last states are considered (2, 1)-stuttering if they are squeezed into terminal states, unless they are terminal themselves4. In any other case, they are considered (1, 1)-stuttering. The formal definitions follow.

Definition 7 (Stuttering States). *A non-last state* σ ∈ Σ *is called a* (h, )*-*stuttering *state, for* <sup>h</sup> <sup>≥</sup> <sup>≥</sup> <sup>1</sup>*, if: (i)* tr ((σ)) = (tr<sup>h</sup>(σ))*; (ii) for every* i<*,* tr <sup>i</sup> ((σ)) ∈ <sup>F</sup>*; (iii)* tr ((σ)) <sup>∈</sup> <sup>F</sup> *implies that* (tr<sup>h</sup>(σ)) <sup>∈</sup> <sup>F</sup>*. A last state* <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup> *is* (1, 1) stuttering *if* <sup>σ</sup> <sup>∈</sup> <sup>F</sup> *or* (σ) ∈ <sup>F</sup>*. Otherwise, it is* (2, 1)*-*stuttering*.*

To obtain a partitioned simulation, switch states (along any trace), which start new segments, are further required to be squeezed into initial states (since our complexity analysis only applies to initial states). We denote by Sp<sup>d</sup> (τ ) the switch states of trace τ according to partition p<sup>d</sup> and by Sp<sup>d</sup> the switch states of *all* traces according to the partition <sup>p</sup>d. Namely, <sup>S</sup><sup>p</sup><sup>d</sup> <sup>=</sup> init <sup>∪</sup> # tr (σ) \$ \$ <sup>σ</sup> <sup>∈</sup> reach <sup>∧</sup> <sup>p</sup>d(σ) < p<sup>d</sup> tr (σ) %.

Definition 8 (Partitioned Simulation). *We say that a squeezer*  : <sup>Σ</sup> <sup>→</sup> <sup>Σ</sup> *forms a* {(hi, i)}<sup>n</sup> <sup>i</sup>=1-partitioned simulation *according to* <sup>p</sup>d*, denoted* <sup>∼</sup> PS<sup>p</sup><sup>d</sup> ({(hi, i)}<sup>n</sup> <sup>i</sup>=1) *if for every reachable state* σ *we have that:*

$$-\text{ }\sigma\text{ is }(h\_i, \ell\_i)\text{-structures for some }1 \le i \le n\text{, and }\\-\text{ }\sigma\in\mathbb{S}\_{p\_d}\Rightarrow\forall(\sigma)\in init.$$

Note that Definition 7 implies that a non-terminal state may only be squeezed into a terminal state if it is the last state in its segments. When {(hi, i)}<sup>n</sup> <sup>i</sup>=1 is irrelevant or clear from the context, we omit it from the notation and simply write  <sup>∼</sup> PS<sup>p</sup><sup>d</sup> .

<sup>4</sup> Considering a non-terminal last state that is squeezed into a terminal state as (1, 0)-stuttering may have been more intuitive than (2, 1)-stuttering, but both properly capture the discrepancy between the number of transitions in the higher and lower rank traces, and (2, 1) better fits the rest of the technical development, which assumes that hi, <sup>i</sup> ≥ 1.

A trace squeezed by  <sup>∼</sup> PS<sup>p</sup><sup>d</sup> {(hi, i)}<sup>n</sup> i=1 may have an unbounded number of (hi, i)-stuttering states, which hinders the ability to define a recurrence relation based on the simulation. To overcome this, our complexity decomposition may use <sup>k</sup> <sup>≥</sup> <sup>1</sup> to capture a common multiplicative factor of *all* the stuttering pairs, with the target of leaving only a *bounded* number of states whose stuttering exceeds <sup>k</sup> and needs to be added separately. This will become important in Theorem 1.

Observation 1 (Complexity decomposition) *Let* <sup>∼</sup> PS<sup>p</sup><sup>d</sup> {(hi, i)}<sup>n</sup> i=1 *, and* <sup>k</sup> <sup>≥</sup> <sup>1</sup>*. Let* <sup>E</sup> <sup>k</sup> ⊆ {1,...,n} *be the set of indices such that* <sup>h</sup><sup>i</sup> <sup>i</sup> <sup>&</sup>gt; k*. Then for every* <sup>σ</sup><sup>0</sup> <sup>∈</sup> init *we have that*

$$\operatorname{comp}\_s(\sigma\_0) \le \sum\_{\sigma \in \mathbb{S}\_{p\_d}(\tau(\sigma\_0))} \widehat{k} \cdot \operatorname{comp}\_s(\operatorname{\gamma}(\sigma)) + \sum\_{i \in \mathbb{E}\_{\widehat{k}}} \sum\_{\sigma \in \mathbb{K}\_i(\tau(\sigma\_0))} h\_i - \ell\_i \cdot \widehat{k}$$

*where* K<sup>i</sup> τ (σ0) *is the multiset of* (hi, i)*-stuttering states in* τ (σ0)*.*

In the observation, the first addend summarizes the complexity contributed by all the lower-rank traces, while using <sup>k</sup> as an upper bound on the "inflation" of the traces. However, the states that are (hi, i)-stuttering with <sup>h</sup><sup>i</sup> <sup>i</sup> that exceeds <sup>k</sup> contribute additional <sup>h</sup><sup>i</sup> <sup>−</sup> (<sup>i</sup> · k) steps to the complexity, and as a result, need to be taken into account separately. This is handled by the second addend, which adds the steps that were not accounted for by the first addend. While we use the same inflation factor <sup>k</sup> across the entire trace, a simple extension of the decomposition property may consider a different factor <sup>k</sup> in each segment. Note that the first addend always sums over a finite number of elements since the number of switch states is at most d – the number of segments. If τ (σ0) is finite, the second addend also sums over a finite number of elements.

Observation 1 considers the complexity function over states, and is oblivious to the rank. In particular, it does not rely on the squeezer decreasing the rank of states. Next, we use this observation as the basis for extracting a recurrence relation for the complexity function over ranks, in which case, decreasing the rank becomes important.

### 3.3 Extraction of recurrence relations over ranks

Based on the complexity decomposition, we define recurrence relations that capture comp<sup>x</sup> — the time complexity of the initial states as a function of their ranks. To go from the complexity as a function of the actual states (as in Observation 1) to the complexity as a function of their ranks, we need to express the rank of (σs) for a switch state σ<sup>s</sup> as a function of the rank of σ0. To this end, we define ˆ :

Definition 9. *Given* <sup>r</sup>*, and* <sup>p</sup><sup>d</sup> *such that* <sup>∼</sup> PS<sup>p</sup><sup>d</sup> *, a function* <sup>ˆ</sup> : <sup>X</sup> ×{1,...,d} → X *is a* rank bounding function *if for every* ρ ∈ X − B *and* 1 ≤ i ≤ d*, if* τ (σ0) *is an initial trace such that* <sup>r</sup>(σ0) = <sup>ρ</sup>*, and* <sup>σ</sup><sup>s</sup> <sup>∈</sup> <sup>S</sup><sup>p</sup><sup>d</sup> (<sup>τ</sup> (σ0)) *is a switch state such that* pd(σs) = i*, the following holds:*

*(i) upper bound:* r (σs) <sup>2</sup> <sup>ˆ</sup> (ρ, i) *and (ii) rank decrease:* <sup>ˆ</sup> (ρ, i) <sup>≺</sup> <sup>ρ</sup> In other words, Definition 9 requires that for every non-base initial state σ<sup>0</sup> ∈ init and switch state <sup>σ</sup><sup>s</sup> at segment <sup>i</sup> of <sup>τ</sup> (σ0), we have that <sup>r</sup>((σs)) <sup>2</sup> <sup>ˆ</sup> (r(σ0), i) <sup>≺</sup> <sup>r</sup>(σ0). Recall that r((σs)) is well defined since (σs) is required to be an initial state. The definition states that ˆ (ρ, i) provides an upper bound on the rank of squeezed switch states in a non-base trace of rank <sup>ρ</sup>. compx(r((σ))) <sup>≤</sup> compx(<sup>ˆ</sup> (ρ, i)) is ensured by the monotonicity of compx. This definition also requires the rank of non-base traces to strictly decrease when they are squeezed, as captured by the "rank decrease" inequality.

Obtaining a rank bounding function, or even verifying that a given ˆ satisfies this requirement, is a challenging task. We return to this question later in this section.

These conditions allow to substitute the states for ranks in the first addend of Observation 1, and hence obtain recurrence relations for comp<sup>x</sup> over the (decreasing) ranks. To handle the second addend, we also need to bound the number of states whose stuttering, <sup>h</sup><sup>i</sup> <sup>i</sup> , exceeds k. This is summarized by the following theorem:

Theorem 1. *Let* <sup>r</sup> : init <sup>→</sup> <sup>X</sup> *be a rank function,*  : <sup>Σ</sup> <sup>→</sup> <sup>Σ</sup> *a squeezer and* <sup>p</sup><sup>d</sup> : <sup>Σ</sup> → {1,...,d} *a partition function such that* <sup>∼</sup> PSp<sup>d</sup> {(hi, i)}<sup>n</sup> i=1 *. Let* <sup>ˆ</sup> : <sup>X</sup> × {1,...,d} → <sup>X</sup> *be a rank bounding function w.r.t.* <sup>r</sup>*, and* <sup>p</sup>d*. If, for some* <sup>k</sup> <sup>≥</sup> <sup>1</sup>*, the number of* (hi, i)*-stuttering states that appear along* any *non-base initial trace is bounded by a constant* <sup>b</sup><sup>i</sup> <sup>∈</sup> <sup>N</sup> *whenever* <sup>i</sup> <sup>∈</sup> <sup>E</sup> <sup>k</sup>*, then*

$$\operatorname{comp}\_x(\rho) \le \sum\_{i=1}^d \widehat{k} \cdot \operatorname{comp}\_x \left( \mathbb{V}(\rho, i) \right) + \sum\_{i \in \mathbb{E}\_{\widehat{k}}} b\_i \cdot \left( h\_i - \ell\_i \cdot \widehat{k} \right). \tag{6}$$

Note that a state may be (hi, i)-stuttering for several i's, in which case, it is sound to count it towards any of the bi's; in particular, we choose the one that minimizes <sup>h</sup><sup>i</sup> <sup>−</sup> i·k.

Corollary 1. *Under the premises of Theorem 1, if* <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>N</sup> ∪ {∞} *satisfies* <sup>f</sup>(ρ) = <sup>d</sup> <sup>i</sup>=1 <sup>k</sup> · <sup>f</sup>(<sup>ˆ</sup> (ρ, i)) + <sup>i</sup>∈E<sup>k</sup> <sup>b</sup><sup>i</sup> · (h<sup>i</sup> <sup>−</sup> <sup>i</sup> · k) *for every* <sup>ρ</sup> <sup>∈</sup> <sup>X</sup> <sup>−</sup> <sup>B</sup>*, and* compx(ρ) <sup>≤</sup> f(ρ) *for every* ρ ∈ B*, then* compx(ρ) ≤ f(ρ) *for every* ρ ∈ X*. We conclude that* comps(σ0) ≤ f(r(σ0)) *for every* σ<sup>0</sup> ∈ init*.*

*Base-case complexity* In order to apply Cor. 1, we need to accompany Eq. (6) with a bound on compx(ρ) for the base ranks, ρ ∈ B. Fortunately, this is usually a significantly easier task. In particular, the running time of the base cases is often constant, because intuitively, the following are correlated: (a) the rank, (b) the size of the underlying data structure, and (c) the number of iterations. In this case, symbolic execution may be used to obtain bounds for base cases (as we do in our work). In essence, any method that can yield a closed-form expression for the complexity of the base cases is viable. In particular, we can apply our technique on the base case as a subproblem.

### 3.4 Establishing the requirements of the recurrence relations extraction

Theorem 1 defines a recurrence relation from which an upper bound on the complexity function, compx, can be computed (Cor. 1). However, to ensure correctness, the premises of Theorem 1 must be verified. The requirement that  <sup>∼</sup> PS<sup>p</sup><sup>d</sup> ({(hi, i)}<sup>n</sup> <sup>i</sup>=1) (see Definition 8) may be verified *locally* by examining individual (reachable) states: for any (reachable) state σ, the check for (hi, i)-stuttering and switch states can, and should, be done in tandem, and require only observing at most max<sup>i</sup> h<sup>i</sup> transition steps from σ and max<sup>i</sup> <sup>i</sup> from (σ). In contrast, the property required of ˆ is *global*: it requires ˆ (ρ, i) to provide an upper bound on the rank of *any* squeezed switch state that may occur in *any* position along *any* non-base initial trace whose initial state has rank ρ. Similarly, the property required of the bounds b<sup>i</sup> is also *global*: that the number of (hi, i)-stuttering states along *any* non-base initial trace is at most bi. It is therefore not clear how these requirements may be verified in general. We overcome this difficulty by imposing additional restrictions, as we discuss next.

Establishing bounds on the number of occurrences of stuttering states Bounds on the number of occurrences *per trace* that are sound *for every trace* are difficult to obtain in general. While clever analysis methods exist that can do this kind of accounting, we found that a stronger, simpler condition applies in many cases:

	- <sup>σ</sup> is (hi, i)-stuttering with <sup>h</sup><sup>i</sup> <sup>i</sup> <sup>≤</sup> k; *or*
	- <sup>σ</sup> is (hi, i)-stuttering (with <sup>h</sup><sup>i</sup> <sup>i</sup> <sup>&</sup>gt; k), *and* either <sup>σ</sup> is a switch state or tr<sup>h</sup><sup>i</sup> (σ) is a last state.

This restricts these cases to occur only at the beginnings and ends of segments. It implies a total bound of <sup>2</sup>d· maxi(h<sup>i</sup> <sup>−</sup> i·k) on the "surplus" of any trace, therefore, we substitute this expression for the rightmost sum in Eq. (6).

Validating a rank bounding function The definition of a rank bounding function (Definition 9) encapsulates two parts. Part (ii) ensures that the rank decreases: <sup>ˆ</sup> (ρ, i) <sup>≺</sup> ρ for every ρ ∈ X − B. Verifying that this requirement holds does not involve any reasoning about the states, nor traces, of the transition system. Part (i) ensures that ˆ provides an upper bound on the rank of squeezed switch states. Formally, it requires that <sup>r</sup>((σs)) <sup>2</sup> <sup>ˆ</sup> (r(σ0), i) for every switch state <sup>σ</sup><sup>s</sup> in segment <sup>i</sup> ∈ {1,...,d} along a trace that starts from a non-base initial state σ0. Namely, it relates the rank of the squeezed switch state, (σs), to the rank of the initial state, σ0, where no bound on the length of the trace between the initial state σ<sup>0</sup> and the switch state σ<sup>s</sup> is known a priori. As such, it involves global reasoning about traces. We identify two cases in which such reasoning may be avoided: (i) The partition p<sup>d</sup> consists of a single segment (i.e., d = 1); or (ii) The rank function extends to *any* state (and not just the initial states), while being preserved by tr . In both of these cases, we are able to verify the correctness of ˆ locally.

*A single segment.* In this case, the only switch state along a trace is the initial state, and hence the upper-bound requirement of ˆ boils down to the requirement that for every <sup>σ</sup><sup>0</sup> <sup>∈</sup> init such that <sup>r</sup>(σ0) <sup>∈</sup> <sup>X</sup> <sup>−</sup> <sup>B</sup>, we have that <sup>r</sup>((σ0)) <sup>2</sup> <sup>ˆ</sup> (r(σ0), 1).

Lemma 1. *Let* <sup>r</sup>*, and* <sup>p</sup><sup>1</sup> : <sup>Σ</sup> → {1} *such that* <sup>∼</sup> PS<sup>p</sup><sup>1</sup> *. Then* <sup>ˆ</sup> : <sup>X</sup> × {1} → X *satisfies the upper-bound requirement of a rank bounding function if and only if* <sup>r</sup>((σ0)) <sup>2</sup> <sup>ˆ</sup> (r(σ0), 1) *for every* <sup>σ</sup><sup>0</sup> <sup>∈</sup> init *such that* <sup>r</sup>(σ0) <sup>∈</sup> <sup>X</sup> <sup>−</sup> <sup>B</sup>*.*

*Rank preservation.* Another case in which the upper-bound property of ˆ may be verified locally is when the r can be extended to *all* states while being preserved by tr :

Definition 10. *A function* rˆ : Σ → X extends *the rank function* r : init → Σ *if* rˆ *agrees with* r *on the initial states, i.e.,* rˆ(σ0) = r(σ0) *for every initial state* σ<sup>0</sup> ∈ init*. The extended rank function* rˆ *is* preserved by tr *, if for every reachable state* σ*, we have that* rˆ(tr (σ)) = ˆr(σ)*.*

Preservation of rˆ by tr ensures that all states along a (reachable) trace share the same rank. In particular, for a reachable switch state σ<sup>s</sup> that lies along τ (σ0), rank preservation ensures that rˆ(σs)=ˆr(σ0) = r(σ0) (the last equality is due to the extension property), allowing us to recover the rank of σ<sup>0</sup> from the rank of σs. Therefore, the upper-bound requirement of ˆ simplifies into the *local* requirement that for every reachable switch state <sup>σ</sup><sup>s</sup> such that <sup>r</sup>ˆ(σs) <sup>∈</sup> <sup>X</sup> <sup>−</sup> <sup>B</sup>, we have that <sup>r</sup>ˆ((σs)) <sup>2</sup> <sup>ˆ</sup> (ˆr(σs), i), for every i ∈ {1,...,d}.

Lemma 2. *Let* <sup>r</sup>*, and* <sup>p</sup><sup>d</sup> : <sup>Σ</sup> → {1,...,d} *such that* <sup>∼</sup> PSp<sup>d</sup> *. Suppose that* <sup>r</sup><sup>ˆ</sup> : <sup>Σ</sup> <sup>→</sup> <sup>X</sup> *extends* <sup>r</sup> *and is preserved by* tr *. Then* <sup>ˆ</sup> : <sup>X</sup> × {1,...,d} → <sup>X</sup> *satisfies the upper-bound requirement of a rank bounding function if and only if* <sup>r</sup>ˆ((σs)) <sup>2</sup> <sup>ˆ</sup> (ˆr(σs), i) *for every reachable switch state* <sup>σ</sup><sup>s</sup> *such that* <sup>r</sup>ˆ(σs) <sup>∈</sup> <sup>X</sup> <sup>−</sup> <sup>B</sup> *and for every* i ∈ {1,...,d}*.*

*Remark 1.* The notion of a partitioned simulation requires a switch state σ<sup>s</sup> to be squeezed into an initial state. This requirement may be relaxed into the requirement that σ<sup>s</sup> is squeezed into a *reachable* state (σs), provided that we are able to still ensure that the rank of (some) *initial* state σ <sup>0</sup> leading to (σs) is smaller than the rank of the trace on which σ<sup>s</sup> lies, and that the rank of σ <sup>0</sup> is properly captured by ˆ . One case in which this is possible, is when r is extended to rˆ that is preserved by tr , as in this case rˆ((σs)) = ˆr(σ <sup>0</sup>) = r(σ 0).

This subsection described *local* properties that ensure that a given program satisfies the requirements of Theorem 1. The locality of the properties facilitates the use of SMT solvers to perform these checks automatically. This is a key step for effective application of the method.

### 3.5 Trace-length vs. state-size recurrences with squeezers

A plethora of work exists for analyzing the complexity of programs (see Section 6 for a discussion of related works). Most existing techniques for automatic complexity analysis aim to find a recurrence relation on the length of the execution trace, relating the length of a trace from some state to the length of the remaining trace starting at its successor. These are recurrences on *time*, if you will, whereas our approach generates recurrences on the state *size* (captured by the rank). Is our approach completely orthogonal to preceding methods? Not quite. It turns out that from a conceptual point of view, our approach can formulate a recurrence on time as well, as we demonstrate in this section.

*Obtaining trace-length recurrences based on state squeezers* The key idea is to use tr itself as a squeezer that squeezes each state into its immediate successor. Putting aside the initial-anchor requirement momentarily, such a squeezer forms a partitioned simulation with a single segment (i.e., p<sup>d</sup> ≡ 1), in which all the states along a trace are (1, 1)-stuttering, except for the last one (if the trace is finite), which is (2, 1)-stuttering. Recall that squeezers must also preserve initial states (see Definition 8), a property that may be violated when  = tr , as the successor of an initial state is not necessarily an initial state. We restore the initial-anchor property by setting init & <sup>=</sup> <sup>Σ</sup>, i.e., every state is considered an initial state5.

A consequence of this definition is that comp<sup>x</sup> will now provide an upper bound on the time complexity of *every* state and not only of the initial states, in terms of a rank that needs to be defined. If we further define a rank-bounding function ˆ we may extract a recurrence relation of the form

$$\operatorname{comp}\_x(\rho) = \operatorname{comp}\_x(\hat{\vee}(\rho)) + 1$$

(we use ˆ (ρ) as an abbreviation of ˆ (ρ, 1), since this is a special case where d = 1).

*Defining the rank and the rank bounding function* Recall that the rank r : Σ → X captures the features of the (initial) states that determine the complexity. To allow maximal precision, especially since *all* states are now initial, we set X to be the set of *states* Σ, and define r to be the identity function, r (σ) = σ. With this definition, comp<sup>x</sup> and comp<sup>s</sup> become one. Next, we need to define <sup>≺</sup> and <sup>B</sup>, while ensuring that  squeezes the (non-base) initial states, which are now *all* the states, into states of a lower rank according to <sup>≺</sup>. Since squeezers act like transitions now, having that  <sup>=</sup> tr , they have the effect of decreasing the number of transitions remaining to reach a terminal state (provided that the trace is finite). We use this observation to define ≺ ⊆ Σ × Σ. Care is needed to ensure that (Σ, ≺) is well-founded, i.e., every descending chain is finite, even though the program may *not* terminate. Here is the definition that achieves this goal:

$$
\sigma\_1 \prec \sigma\_2 \iff \operatorname{comp}\_s(\sigma\_1) < \operatorname{comp}\_s(\sigma\_2) \tag{7}
$$

Since  = tr does not decrease comp<sup>s</sup> for states that belong to infinite (nonterminating) traces (comps((σ)) = comps(σ) = <sup>∞</sup>, hence (σ) ≺ <sup>σ</sup>), they must be included in B, together with the terminal states, which are minimal w.r.t. ≺. Namely, B = F ∪ {σ | comps(σ) = ∞}. Technically, this means that the base of the recurrence needs to define comp<sup>x</sup> for these states.

The final piece in the puzzle is setting <sup>ˆ</sup> <sup>=</sup> tr . Since  <sup>∼</sup> PS<sup>p</sup><sup>d</sup> {(1, 1),(2, 1)} (when init & <sup>=</sup> <sup>Σ</sup>), where the number of (2, 1)-stuttering states that appear along any non-base initial trace is bounded by <sup>1</sup>, we may use Theorem 1, setting <sup>k</sup> = 1, to derive the following recurrence relation, which reflects induction over time:

$$\operatorname{comp}\_x(\sigma) = \operatorname{comp}\_x(\operatorname{tr}(\sigma)) + 1.$$

<sup>5</sup> In fact, it suffices to consider init " <sup>=</sup> reach, in which case we may be able to take advantage of information from static analyses

The formulation above represents a degenerate, naïve, choice of ingredients for the sake of a theoretical construction, whose purpose is to lay the foundation for a general framework that takes its strengths from both induction over time and induction over rank. This construction does not exploit the full flexibility of our framework. In particular, ranking functions obtained from termination proofs, as used in [5], may be used to augment the rank in this setting. Further, invariants inferred from static analysis can be used to refine the recurrences.

# 4 Synthesis

So far we have assumed that the rank function r, partition function pd, squeezer  and a rank bounding function ˆ are all readily available. Clearly, they are specific to a given program. It would be too tedious for a programmer to provide these functions for the analysis of the underlying complexity. In this section we show how to automate the process of obtaining (r , pd, , ˆ ) for a class of typical looping programs. We take advantage of the fact that these components are much more compact than other kinds of auxiliary functions commonly used for resource analysis, such as monotonically decreasing measures used as ranking functions. For example, a ranking function for the binary counter program shown in Fig. 1 is:

$$m(n,i,c) = \left(n \cdot \sum\_{j=0}^{n-1} 2^j \cdot c[j]\right) + (2^i - 1) + (n - i)$$

whereas the rank, partition,  and ˆ are

$$\begin{array}{ll} r(n,i,c) = n & \forall (n,i,c) = \left(n-1, (i \ge n) \; ? \; i-1 : i, c[:n-1]\right) \\ \text{\textquotedblleft}(\rho) = \rho - 1 & p\_d(n,i,c) = \left(i \ge n \; | \; c[n-1] \right) \; ? \; 2 : 1 \end{array}$$

This enables the use of a relatively naïve enumerative approach of multi-phase generateand-test, employing some early pruning to discard obviously non-qualifying candidates.

# 4.1 SyGuS

The generation step of the synthesis loop applies syntax guided synthesis (SyGuS [7]). Like any other SyGuS method, defining the underlying grammars is more art than science. It should be expressive enough to capture the desired terms, but strict enough to effectively bound the search space.

*Ranks* are taken from <sup>N</sup><sup>m</sup> where <sup>m</sup> ∈ {1, <sup>2</sup>, <sup>3</sup>} and <sup>≺</sup> is the usual lexicographic order. The rank function r comprises of one expression for each coordinate, constructed by adding / subtracting integer variables and array sizes. Boolean variables are not used in rank expressions.

*Partition functions* pd*.* Our implementation currently supports a maximum number of two segments. This means that the partition function only assigns the values 1 and 2, and we synthesize it by generating a condition over the program's variables, cond, that selects between them: pd(σ) = cond(σ)?2:1. Handling up to two segments is *not* an inherent limitation, but we found that for typically occurring programs, two segments are sufficient.

*Squeezers*  are the only ingredient that requires substantial synthesis effort. We represent squeezers as small loop-free imperative programs, which are natural for representing state transformations. We use a rather standard syntax with 'if-then-else' and assignments, plus a remove-adjust operation that removes array entries and adjusts indices relating to them accordingly. .

*Rank bounding functions* ˆ *.* With a well-chosen squeezer , it suffices to consider quite simple rank bounds for the mini-traces. Hence, the rank-bounds defined by ˆ are obtained by adding, subtracting and multiplying variables with small constants (for each coordinate of the rank). Similar to the choice of ranks, targeting simple expressions for ˆ helps reduce the complexity of the final recurrence that is generated from the process.

### 4.2 Verification

For the sake of verifying the synthesized ingredients, we fix a set {hi, i} of stuttering shapes, and check the requirements of Theorem 1 as discussed in Section 3.4. In particular, we check that p<sup>d</sup> is weakly monotone, i.e., that cond cannot change from true to false in any step of tr . Note that some of the properties may be used to discriminate some of the ingredients independent of the others. For example, the simulation requirement only depends on  and pd.

*Unbounded verification* Once candidates pass a preliminary screening phase, they are verified by encoding the program and all the components r , pd, , ˆ as first-order logic expressions, and using an SMT solver (Z3 [13]) to verify that the requirements are fulfilled for all traces of the program.

As mentioned in Section 3.4, all the checks are local and require observing a bounded set of steps starting from a given σ. The only facet of the criteria that is difficult to encode is the fact they are required of the reachable states (and not any state). Of course, if we are able to ascertain that these are met for *all* σ ∈ Σ, including unreachable states, then the result is sound. However, for some programs and squeezers, the required properties (esp., simulation) do not hold universally, but are violated by unreachable states. To cope with this situation without having to manually provide invariants that capture properties of the reachable states, we use a CHC solver, Spacer [23], which is part of Z3, to check whether all the reachable states in the unbounded-state system induced by the input program satisfy these properties. This can be seen as a reduction from the problem of verifying the premises of Theorem 1 to that of verifying a safety property.

# 5 Empirical Evaluation

We implemented our complexity analyzer as a publicly available tool, SqzComp, that receives a program in a subset of C and produces recurrence relations. SqzComp is written in C++, using the Z3 C++ API [13], and using Spacer [23] via its SMTLIB2 compatible interface. Since our squeezers may remove elements from arrays, we initially encoded arrays as SMT sequences. However, we found that it is beneficial to


Table 1. Experimental results. In array programs, A denotes an array. x, y, z, n, m, k, a are integer variables.

restrict squeezers to only remove the first or last elements of an array, resulting in a more efficient encoding with the theory of arrays. For the base case of generated recurrences, we use the symbolic execution engine KLEE [11] to bound the total number of iterations by a constant.

### 5.1 Experiments

We evaluated our tool, SqzComp, on a variety of benchmark programs taken from [16], as well as three additional programs: the binary counter example from Section 2, a subsets example, described in Section 5.2, and an example computing monotone sequences. These examples exhibit intricate time complexities. From the benchmark suite of [16] we filtered out non-deterministic programs, as well as programs that failed syntactic constraints that our frontend cannot currently handle. We compared SqzComp to CoFloCo [16]—the state of the art tool for complexity analysis of imperative programs.

Table 1 summarizes the results of our experiments. The first column presents the name of the program, which describes its characteristics (each of the "two-phase loop" programs consists of a loop with an if statement, where the branch executed changes starting from some iteration). The second column specifies the real complexity, while the following two columns present the bounds inferred by SqzComp and by CoFloCo, respectively. (For SqzComp, the reported bounds are the solutions of the recurrences

```
1 void subsets(uint n, uint k, uint m) {
2 uint I[k]; int j = 0; bool f = true;
3 while (j >= 0) {
4 if (j >= k) /*start left scan*/{f=false; j--;}
5 else if (j==0 && f) /*init*/{f=true;I[0]=m;j++;}
6 else if (f) /*right fill*/{f=true;I[j]=I[j-1]+1;j++;}
7 else if (I[j]>=n-k+j)/*left scan*/{f=false; j--;}
8 else /*start right fill*/{f=true; I[j]=I[j]+1;j++;}
9 }}
```

```
squeezer(uint I[], uint n, uint k, uint m, int j, bool f) {
 if (I[0]==m && j>0) { m++; remove I[0]; k--; j--; }
 else if (I[0]==m) { m++; remove I[0]; k--; }
 else { m++; }
}
```
Fig. 4. An example program that produces all subsets of {m, . . . , n − 1} of size k; below is the synthesized squeezer.

output by the tool.) The fourth and fifth columns present the analysis running time, respectively the number of segments used in the analysis, of SqzComp.

CoFloCo's analysis time is always in the order of magnitude of 0.1 second, whether it succeeds to find a complexity bound or not. Our analysis is considerably slower, mostly due to the naïve implementation of the synthesizer. When both CoFloCo and SqzComp succeed, the bounds inferred by CoFloCo are sometimes tighter.

However, SqzComp manages to find tight complexity bounds for the new examples, which are not solved by CoFloCo, and to the best of our knowledge, are beyond reach of existing tools. (We also encoded the new examples as OCaml programs and ran the tool of [20] on them, and it failed to infer bounds.)

### 5.2 Case study: Subsets example

This subsection presents one challenging example from our benchmarks, the subsets example, and the details of its complexity analysis. Notably, our method is able to infer a binomial bound, which is asymptotically tight.

The code, shown in Fig. 4, iterates over all the subsets of {m,...,n-1} of size <sup>k</sup>. The "current" subset is maintained in an array <sup>I</sup> whose length is <sup>k</sup>, and which is always sorted, thus avoiding generating the same set more than once. The first k iterations of the loop fill the array with values {m,m+1,...,m+k-1}, which represent the first subset generated. This is taken care of by the branches at lines 5, 6 that perform a "right fill" phase, filling in the array with an ascending sequence starting from <sup>m</sup> at I[0]. Once the first k iterations are done, <sup>j</sup> reaches the end of the array (j=k) and so the next iteration will execute line 4, turning off the flag <sup>f</sup>, signifying that the array should now be scanned leftwards. In each successive iteration, <sup>j</sup> is decreased, looking for the rightmost element that can be incremented. For example, if n = 8, I = [2, 6, 7], this rightmost element is I[0] = 2. After that element is incremented, the flag <sup>f</sup> is turned on again, completing the "left scan" phase and starting a "right fill" phase.

Fig. 5. An illustration of the 2-partitioned simulation for the subsets example. In the univariate case, the rank of the upper trace is n − m and that of the lower traces is n − m − 1. In the multivariate case, the upper trace is of rank (n−m, k), lower traces of ranks (n−m−1, k −1), (n − m − 1, k).

*A univariate recurrence* Consider the rank function r(I, n, k, m, j, f) = n − m, defined with respect to (N, <), and the squeezer shown below the program in Fig. 4. The squeezer observes the first element of the array: if it is equal to m (the lower bound of the range), it removes it from the array, shrinking its size (k) by one. It then adjusts the index j to keep pointing to the same element; unless j = 0, in which case that element is removed. This squeezer forms a 2-partitioned simulation, as illustrated by the traces in Fig. 5. All states are (1, 1)-stuttering, except for σ0, which is (2, 1)-stuttering, as caused by the removal of <sup>I</sup>[0] when <sup>j</sup> = 0. The rank bounding function is <sup>ˆ</sup> (i, ρ) = <sup>ρ</sup> <sup>−</sup> <sup>1</sup> for i ∈ {1, 2}. We therefore obtain the following recurrence relation:

$$<sup>\_x(\rho) \le 1 + comp\_x(\rho - 1) + comp\_x(\rho - 1).</sup>$$

The base of the recurrence is compx(0) = 1, leading to the solution compx(ρ) ≤ <sup>2</sup><sup>ρ</sup>+1 <sup>−</sup> <sup>1</sup>. This means that for an initial state, comps(I, n, k, m, <sup>0</sup>,true) <sup>≤</sup> compx(<sup>n</sup> <sup>−</sup> <sup>m</sup>) <sup>≤</sup> <sup>2</sup><sup>n</sup>−m+1 <sup>−</sup> <sup>1</sup>.

*A multivariate recurrence* Consider an alternative rank definition r(I, n, k, m, j, f) = (<sup>n</sup> <sup>−</sup> m, k) defined with respect to (<sup>N</sup> <sup>×</sup> <sup>N</sup>, <), where '<' denotes the lexicographic order, together with the same squeezer and partition as before. The rank bounding function is now ˆ (ρ1, ρ2), i = (ρ<sup>1</sup> − 1, ρ<sup>2</sup> − 1) i = 1 (ρ<sup>1</sup> <sup>−</sup> <sup>1</sup>, ρ2) <sup>i</sup> = 2. The corresponding recurrence relation is:

$$comp\_x(\rho\_1, \rho\_2) \le 1 + comp\_x(\rho\_1 - 1, \rho\_2 - 1) + comp\_x(\rho\_1 - 1, \rho\_2)$$

with base compx(0, \_)=1, resulting in the solution compx(ρ1, ρ2) <sup>≤</sup> <sup>ρ</sup>1+2 ρ2 . That is, for an initial state, comps(I, n, k, m, <sup>0</sup>,true) <sup>≤</sup> compx(<sup>n</sup> <sup>−</sup> m, k) <sup>≤</sup> <sup>n</sup>−m+2 k .

Interestingly, this example demonstrates that the same squeezer may yield different recurrences, when different ranks (and rank bounding functions) are considered. It also demonstrates a case where different segments of a trace are mapped to mini-traces of a different rank.

# 6 Related Work

This section focuses on exploring existing methods for *static* complexity analysis of *imperative* programs. Dynamic profiling and analysis [26] are a separate research area, more related to testing, and generally do not provide formal guarantees. We further focus on works that determine *asymptotic* complexity bounds, and use the number of iterations executed as their cost model; we refrain from thoroughly covering previous techniques that analyze complexity at the instruction level.

*Static cost analysis* The seminal work of [28] defined a two steps meta-framework where recurrence relations are extracted from the underlying program, and then analyzed to provide closed-form upper bounds. Broadly speaking, cost relations are a generalized framework that captures the essence of most of the works mentioned in this section.

[4] and [16] infer cost relations of imperative programs written in Java and C respectively. Cost relations resemble somewhat limited C procedures: They are capable of recursive calls to other cost relations, and they can handle non-determinism that arises either as a consequence of direct nondet() in the program, or as a result of inherent imprecision of static analysis. They define for every basic block of the program its own cost relation function, and then form chains according to the control flow graph of the program. They use numerical abstract domains to support a context sensitive analysis of whether a chain of visits to specific basic blocks is feasible or not. Once all infeasible chains are removed, disjunctive analysis determines an overall approximation of the heaviest chain, representing the max number of iterations.

[19] uses multiple counter instrumentation that are automatically inserted in various points in the code, initialized and incremented. These ghost counters enable to infer an overall complexity bound by applying appropriate abstract interpretation handling numeric domains. [18] and [17] apply code transformations to represent multi-path loops and nested loops in a canonical way. Then, paths connecting pairs of "interesting" code points π1, π<sup>2</sup> (loop headers etc.) are identified, in a way that satisfies some properties. For instance, π<sup>1</sup> is reached twice *without* reaching π2. The path property induces progress invariants, which are then analyzed to infer the overall complexity bound.

[24] define an abstraction of the program to a *size-change-graph*, where transition edges of the control flow graph are annotated to capture sound over-approximation relations between integer variables. The graph is then searched for infinitely decreasing sequences, represented as words in an ω-regular language. This representation concisely characterizes program termination. [29] then harnesses the size-change abstraction from [24] to analyze the complexity of imperative programs. First, they apply standard program transformations like pathwise analysis to summarize inner nested loops. Then, they heuristically define a set of scalar rank functions they call norms. These norms are somewhat similar to our rank function in the sense that they help to abstract away program parts that do not effect its complexity. The program is then represented as a size-change graph, and multi-path contextualization [25] prunes subsequent transitions which are infeasible.

[8] introduces *difference constraints* in the context of termination, to bound variables x in current iteration with some y in previous iteration plus some constant c: x ≤ y + c. [27] extends difference constraints to complexity analysis. Indeed, it is quite often the case that ideas from the area of program termination are assimilated in the context of complexity analysis and vice versa. They exploit the observation that typical operations on loop counters like increment, decrement and resets are essentially expressible as difference constraints. They design an abstraction based on the domain of difference constraints, and obtain relevant invariants which are then used in determining upper bounds. [10] is very similar, only that it represents a program as an integer transition system and allows nonlinear numerical constraints and ranking functions. As we mentioned earlier, all of these approaches are based on identifying the progress of executions over time, characterizing the progress between two given points in the program. In contrast, our approach allows to reason over state size and compares whole executions.

*Squeezers.* The notion of squeezers was introduced by [22] for the sake of safety verification. As discussed in Section 1, the challenges in complexity analysis are different, and require additional ingredients beyond squeezers. [15,1,2] introduce *well structured transition systems*, where a well-quasi order (wqo) on the set of states induces a simulation relation. This property ensures decidability of safety verification of such systems (via a backward reachability algorithm). Our use of squeezers that decrease the rank of a state and induce a sort of a simulation relation may resemble the wqo of a well structured transition system. However, there are several key differences: we do not require the order (which is defined on ranks) to be a wqo. Further, we do not require a simulation relation between *any* states whose ranks are ordered, only between a state and its squeezed counterpart. Notably, our work considers complexity analysis rather than safety verification.

# 7 Conclusion

This work introduces a novel framework for run-time complexity analysis. The framework supports derivation of recurrence relations based on inductive reasoning, where the form of induction depends on the choice of a squeezer (and rank bounding function). The new approach thus offers more flexibility than the classical methods where induction is coupled with the time dimension. For example, when the rank captures the "state size", the approach mimics induction over the space dimension, reasoning about whole traces, and alleviating the need to describe the intricate development of states over time. We demonstrate that such squeezers and rank bounding functions, which we manage to synthesize automatically, facilitate complexity analysis for programs that are beyond reach for existing methods. Thanks to the simplicity and compactness of these ingredients, even a rather naïve enumeration was able to find them efficiently.

Acknowledgements. The research leading to these results has received funding from the European Research Council under the European Union's Horizon 2020 research and innovation programme (grant agreement No [759102-SVIS]). This research was partially supported by the United States-Israel Binational Science Foundation (BSF) grant No. 2016260 and 2018675, the Israeli Science Foundation (ISF) grants No. 1996/18, 1810/18, 243/19 and 2740/19, and the Pazy Foundation.

# References


struction and Analysis of Systems. pp. 337–340. TACAS'08/ETAPS'08, Springer-Verlag, Berlin, Heidelberg (2008)


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.


# **Complete trace models of state and control**

Guilhem Jaber<sup>1</sup> (-) and Andrzej S. Murawski<sup>2</sup> (-)

<sup>1</sup> Universit´e de Nantes, LS2N CNRS, Inria, Nantes, France guilhem.jaber@univ-nantes.fr <sup>2</sup> University of Oxford, Oxford, UK andrzej.murawski@cs.ox.ac.uk

**Abstract.** We consider a hierarchy of four typed call-by-value languages with either higher-order or ground-type references and with either call/cc or no control operator.

Our first result is a fully abstract trace model for the most expressive setting, featuring both higher-order references and call/cc, constructed in the spirit of operational game semantics. Next we examine the impact of suppressing higher-order references and callcc in contexts and provide an operational explanation for the game-semantic conditions known as visibility and bracketing respectively. This allows us to refine the original model to provide fully abstract trace models of interaction with contexts that need not use higher-order references or call/cc. Along the way, we discuss the relationship between error- and termination-based contextual testing in each case, and relate the two to trace and complete trace equivalence respectively.

Overall, the paper provides a systematic development of operational game semantics for all four cases, which represent the state-based face of the so-called semantic cube.

**Keywords:** contextual equivalence, operational game semantics, higherorder references, control operators

# **1 Introduction**

Research into contextual equivalence has a long tradition in programming language theory, due to its fundamental nature and applicability to numerous verification tasks, such as the correctness of compiler optimisations. Capturing contextual equivalence mathematically, i.e. the full abstraction problem [26], has been an important driving force in denotational semantics, which led, among others, to the development of game semantics [2,12]. Game semantics models computation through sequences of question- and answer-moves by two players, traditionally called O and P, who play the role of the context and the program respectively. Because of its interactive nature, it has often been referred to as a middle ground between denotational and operational semantics.

The full version is available at https://hal.archives-ouvertes.fr/hal-03116698.

<sup>©</sup> The Author(s) 2021

N. Yoshida (Ed.): ESOP 2021, LNCS 12648, pp. 348–374, 2021. https://doi.org/10.1007/978-3-030-72019-3\_13

Over the last three decades the game-semantic approach has led to numerous fully abstract models for a whole spectrum of programming paradigms. Most papers in this strand follow a rather abstract pattern when presenting the models, emphasing structure and compositionality, often developing a correspondence with a categorical framework along the way to facilitate proofs. The operational intuitions behind the games are somewhat obscured in this presentation, and left to be discovered through a deeper exploration of proofs.

In contrast, operational game semantics aims to define models in which the interaction between the term and the environment is described through a carefully instrumented labelled transition system (LTS), built using the syntax and operational semantics of the relevant language. Here, the derived trace semantics can be shown to be fully abstract. In this line of work, the dynamics is described more directly and provides operational intuitions about the meaning of moves, while not immediately giving structural insights about the structure of the traces.

In this paper, we follow the operational approach and present a whole hierarchy of trace models for higher-order languages with varying access to higherorder state and control. As a vehicle for our study, we use HOSC, a call-by-value higher-order language equipped with general references and continuations. We also consider its sublanguages GOSC, HOS and GOS, obtained respectively by restricting storage to ground values, by removing continuations, and by imposing both restrictions. We study contextual testing of a class of HOSC terms using contexts from each of the languages **x** ∈ {HOSC, GOSC, HOS, GOS}; we write **x** to refer to each case. Our working notion of convergence will be error reachability, where an error is represented by a free variable. Accordingly, at the technical level, we will study a family of equivalence relations ∼=**<sup>x</sup>** err , each corresponding to contextual testing with contexts from **x**, where contexts have the extra power to abort the computation.

Our main results are trace models **Trx**(Γ M) for each **x** ∈ {HOSC, GOSC, HOS, GOS}, which capture <sup>∼</sup>=**<sup>x</sup>** err through trace equivalence:

$$\Gamma \vdash M\_1 \widetilde{=}\_{err}^{\mathbf{x}} M\_2 \text{ if and only if } \mathbf{Tr}\_{\mathbf{x}} (\Gamma \vdash M\_1) = \mathbf{Tr}\_{\mathbf{x}} (\Gamma \vdash M\_2).$$

It turns out that, for contexts with control (i.e. **<sup>x</sup>** ∈ {HOSC, GOSC}), <sup>∼</sup>=**<sup>x</sup>** err coincides with the standard notion of contextual equivalence based on termination, written ∼=**<sup>x</sup>** ter . However, in the other two cases, the former is strictly more discriminating than the latter. We explain how to account for this difference in the trace-based setting, using complete traces.

A common theme that has emerged in game semantics is the comparative study of the power of contexts, as it turned out possible to identify combinatorial conditions, namely visibility [3] and bracketing [22], that correspond to contextual testing in the absence of general references and control constructs respectively. In brief, visibility states that not all moves can be played, but only those that are enabled by a "visible part" of the interaction, which could be thought of as functions currently in scope. Bracketing in turn imposes a discipline on answers, requiring that the topmost question be answered first. In the paper, we provide an operational reconstruction of both conditions.

σ, τ Unit <sup>|</sup> Int <sup>|</sup> Bool <sup>|</sup> ref<sup>τ</sup> <sup>|</sup> <sup>τ</sup> <sup>×</sup> <sup>σ</sup> <sup>|</sup> <sup>τ</sup> <sup>→</sup> <sup>σ</sup> <sup>|</sup> cont <sup>τ</sup> U, V () <sup>|</sup> **tt** <sup>|</sup> **ff** <sup>|</sup> <sup>n</sup># <sup>|</sup> <sup>x</sup> <sup>|</sup> | U, V | λx<sup>τ</sup> .M <sup>|</sup> **rec** <sup>y</sup>(x<sup>τ</sup> ).M <sup>|</sup> cont<sup>τ</sup> <sup>K</sup> M,N <sup>V</sup> | M,N | <sup>π</sup>i<sup>M</sup> <sup>|</sup> MN <sup>|</sup> ref<sup>τ</sup> <sup>M</sup> <sup>|</sup> !<sup>M</sup> <sup>|</sup> <sup>M</sup> := <sup>N</sup> <sup>|</sup> if <sup>M</sup><sup>1</sup> <sup>M</sup><sup>2</sup> <sup>M</sup><sup>3</sup> <sup>|</sup> <sup>M</sup> <sup>⊕</sup> <sup>N</sup> <sup>|</sup> <sup>M</sup> <sup>N</sup> <sup>|</sup> <sup>M</sup> <sup>=</sup> <sup>N</sup> <sup>|</sup> call/cc<sup>τ</sup> (x.M) <sup>|</sup> throw<sup>τ</sup> M to <sup>N</sup> <sup>K</sup> •|V,K|K,M | <sup>π</sup>i<sup>K</sup> <sup>|</sup> V K <sup>|</sup> KM <sup>|</sup> ref<sup>τ</sup> <sup>K</sup> <sup>|</sup> !<sup>K</sup> <sup>|</sup> <sup>V</sup> := <sup>K</sup> <sup>|</sup> <sup>K</sup> := <sup>M</sup> <sup>|</sup> if KMN <sup>|</sup> <sup>K</sup> <sup>⊕</sup> <sup>M</sup> <sup>|</sup> <sup>V</sup> <sup>⊕</sup> <sup>K</sup> <sup>|</sup> <sup>K</sup> <sup>M</sup> <sup>|</sup> <sup>V</sup> <sup>K</sup> <sup>|</sup> <sup>K</sup> <sup>=</sup> <sup>M</sup> <sup>|</sup> <sup>V</sup> <sup>=</sup> <sup>K</sup> | throw<sup>τ</sup> V to K | throw<sup>τ</sup> K to M <sup>C</sup> •|M,C|C,M | <sup>π</sup>i<sup>C</sup> <sup>|</sup> λx<sup>τ</sup> .C <sup>|</sup> **rec** <sup>y</sup>(x<sup>τ</sup> ).C <sup>|</sup> MC <sup>|</sup> CM <sup>|</sup> ref<sup>τ</sup> <sup>C</sup> <sup>|</sup> !<sup>C</sup> | C := M | M := C | if CMN | if MCN | if MNC | C ⊕ M | M ⊕ C <sup>|</sup> <sup>C</sup> <sup>M</sup> <sup>|</sup> <sup>M</sup> <sup>C</sup> <sup>|</sup> <sup>C</sup> <sup>=</sup> <sup>M</sup> <sup>|</sup> <sup>M</sup> <sup>=</sup> <sup>C</sup> <sup>|</sup> call/cc<sup>τ</sup> (x.C) <sup>|</sup> throw<sup>τ</sup> C to <sup>M</sup>


Notational conventions: x, y <sup>∈</sup> **Var**, <sup>∈</sup> **Loc**, <sup>n</sup> <sup>∈</sup> <sup>Z</sup>, <sup>i</sup> ∈ {1, <sup>2</sup>}, ⊕∈{+, <sup>−</sup>, ∗}, ∈ {=, <} Syntactic sugar: let x = M in N stands for (λx.N)M (if x does not occur in N we also

write M; N)

### **Fig. 1.** HOSC syntax

Overall, we propose a unifying framework for studying higher-order languages with state and control, which we hope will make the techniques of (operational) game semantics clearer to the wider community. The construction of the fully abstract LTSs is by no means automatic, as there is no general methodology for extracting trace semantics from game models. Some attempts in that direction have been reported in [25], but the type discipline discussed there is far too weak to be applied to the languages we study. As the most immediate precursor to our work, we see the trace model of contextual interactions between HOS contexts and HOS terms from [23]. In comparison, the models developed in this paper are more general, as they consider the interaction between HOSC terms and contexts drawn from any of the four languages ranged over by **x**.

In the 1990s, Abramsky proposed a research programme, originally called the semantic cube [1], which concerned investigating extensions of the purely functional programming language PCF along various axes. From this angle, the present paper is an operational study of a semantic diamond of languages with state, with GOS at the bottom, extending towards HOSC at the top, either via GOSC or HOS.

# **2 HOSC**

The main objects of our study will be the language HOSC along with its fragments GOSC, HOS and GOS. HOSC is a higher-order programming language equipped with general references and continuations.

Syntax HOSC syntax is given in Figure 1. Assuming countably infinite sets **Loc** (locations) and **Var** (variables), HOSC typing judgments take the form

$$\begin{array}{llll} (K[(\lambda x^{\sigma}.M)V],h) & \rightarrow (K[M\{V/x\}],h) \\ (K[\pi\_{i}(V\_{1},V\_{2})],h) & \rightarrow (K[V\_{i}],h) \\ (K[\text{if }\mathtt{tt }M\_{1}\ \mathtt{M}\_{2}],h) & \rightarrow (K[M\_{1}],h) \\ (K[\text{if }\mathtt{ff }M\_{1}\ \mathtt{M}\_{2}],h) & \rightarrow (K[M\_{2}],h) \\ (K[\text{if }\mathtt{ff }M\_{1}\ \mathtt{M}\_{2}],h) & \rightarrow (K[\overline{\text{n }\mathtt{m}}],h) \\ (K[\text{if }\mathtt{\tilde{n} \in \overline{\text{m}}}],h) & \rightarrow (K[\text{if }\mathtt{\tilde{n} \in \overline{\text{m}}}],h) \\ (K[\text{if }\mathtt{\tilde{n} \in \overline{\text{m}}}],h) & \rightarrow (K[\text{b}],h) \\ (K[\text{if }\mathtt{\tilde{n} \in \overline{\text{m}}}],h) & \rightarrow (K[\text{b}],h) \\ \text{with } b = \text{tt }\text{if }n \sqcap m, \text{otherwise } b = \text{ff} \\ (K[\text{{cons }}\mathtt{x} \sqcap m, \text{otherwise } b = \text{ff}] \\ (K[\text{{call }\mathtt{cc} (x \sqcap M)}],h) & \rightarrow (K[M[\text{cont }K \,\mathtt{X}/x]),h) \end{array}$$

**Fig. 2.** Operational reduction for HOSC

Σ; Γ M : τ , where Σ and Γ are finite partial functions that assign types to locations and variables respectively. In typing judgements, we often write Σ as shorthand for Σ; ∅ (closed) and Γ as shorthand for ∅; Γ (location-free). Similarly, M : τ means ∅; ∅ M : τ .

Operational semantics A heap h is a finite type-respecting map from **Loc** to values. We write h : (Σ; Γ), if dom(Σ) ⊆ dom(h) and Σ; Γ h() : σ for (, σ) ∈ Σ, The operational semantics of HOSC reduces pairs (M,h), where Σ; Γ M : τ and h : (Σ; Γ). The rules are given in Figure 2, where {·} denotes (capture-avoiding) substitution. We write (M,h) ⇓ter if there exist V,h such that (M,h) →<sup>∗</sup> (V,h ) and V is a value.

We distinguish the following fragments of HOSC.


**Definition 2.** Given a HOSC term Γ M : τ , we refer to types in Γ and τ as **boundary types**. Let **x** ∈ {HOSC, GOSC, HOS, GOS}. We say that a HOSC term Γ M : τ has an **x** boundary if all of its boundary types are from **x**.

Remark 1. Note that typing derivations of HOSC terms with an **x** boundary may contain arbitrary HOSC types as long as the final typing judgment uses types from **x** only. Consequently, if **x** = HOSC, HOSC terms with an **x** boundary form a strict superset of **x**.

Next we introduce several notions of contextual testing for HOSC-terms, using various kinds of contexts. For a start, we introduce the classic notion of contextual approximation based on observing termination. The notions are parameterized by **x**, indicating which language is used to build the testing contexts. We write Γ C : τ → τ if Γ, x : τ C[x] : τ , and Γ C ÷ τ if Γ C : τ → τ for some τ .

**Definition 3 (Contextual Approximation).** Let **x** ∈ {HOSC, GOSC, HOS, GOS}. Given HOSC terms Γ M1, M<sup>2</sup> : τ with an **x** boundary, we define <sup>Γ</sup> <sup>M</sup><sup>1</sup> **<sup>x</sup>** ter M<sup>2</sup> to hold, when for all contexts C ÷ τ built from the syntax of **x**, if (C[M1], ) ⇓ter then (C[M2], ) ⇓ter .

We also consider another way of testing, based on observing whether a program can reach a breakpoint (error point) inside a context. Technically, the breakpoints are represented as occurrences of a special free error variable err : Unit → Unit. Reaching a breakpoint then corresponds to convergence to a stuck configuration of the form (K[err ()], h): we write (M,h) ⇓err if there exist K, h such that (M,h) →<sup>∗</sup> (K[err ()], h ).

**Definition 4 (Contextual Approximation through Error).** Suppose **x** ∈ {HOSC, FOSC, HOS, GOS}. Given HOSC terms Γ M1, M<sup>2</sup> : τ with an **x** boundary and err ∈ dom(Γ), we define <sup>Γ</sup> <sup>M</sup><sup>1</sup> **<sup>x</sup>** err M<sup>2</sup> to hold, when for all contexts err : Unit → Unit C ÷ τ built from **x**-syntax, if (C[M1], ) ⇓err then (C[M2], ) ⇓err .

For the languages in question, it will turn out that **<sup>x</sup>** err is at least as discriminating as **<sup>x</sup>** ter for each **x** ∈ {HOSC, GOSC, HOS, GOS}, and that they coincide for **<sup>x</sup>** ∈ {HOSC, GOSC}. We will write <sup>∼</sup>=**<sup>x</sup>** err and <sup>∼</sup>=**<sup>x</sup>** ter for the associated equivalence relations.

For higher-order languages with state and control, it is well known that contextual testing can be restricted to evaluation contexts after instantiating the free variables of terms to closed values (the so-called closed instances of use, CIU). Let us write Σ,Γ γ : Γ for substitutions γ such that, for any (x, σx) ∈ Γ, the term γ(x) is a value satisfying Σ; Γ γ(x) : σx. Then M{γ} stands for the outcome of applying γ to M.

**Definition 5 (CIU Approximation).** Let **x** ∈ {HOSC, GOSC, HOS, GOS} and let Γ M1, M<sup>2</sup> : τ be HOSC terms with an **x** boundary.


Results stating that "CIU tests suffice" are referred to as CIU lemmas. A general framework for obtaining such results for higher-order languages with effects was developed in [10,33]. The results stated therein are for termination-based testing, i.e. ⇓ter , but adapting them to ⇓err is not problematic.

**Lemma 1 (CIU Lemma).** Let **x** ∈ {HOSC, GOSC, HOS, GOS} and **y** ∈ {ter , err}. Then we have <sup>Γ</sup> <sup>M</sup><sup>1</sup> **<sup>x</sup> <sup>y</sup>** <sup>M</sup><sup>2</sup> iff <sup>Γ</sup> <sup>M</sup><sup>1</sup> **x**(ciu) **<sup>y</sup>** <sup>M</sup>2.

The preorders **<sup>x</sup>** err will be the central object of study in the paper. Among others, we shall provide their alternative characterizations using trace semantics.The characterizations will apply to a class of terms that we call cr-free.

**Definition 6.** A HOSC term Γ M : τ is **cr-free** if it does not contain occurrences of cont<sup>σ</sup> K and locations, and its boundary types are cont- and ref-free.

We stress that the boundary restriction applies to Γ and τ only, and subterms of M may well contain arbitrary HOSC types and occurrences of refσ, call/ccσ, throw<sup>σ</sup> for any σ. The majority of HOSC/GOSC/HOS/GOS examples studied in the literature, e.g. [28,4,8], are actually cr-free. We will revisit some of them as Examples 6, 7, 10. The fact that cr-free terms may not contain subterms cont<sup>τ</sup> K or is not really a restriction, as cont<sup>τ</sup> K and being more of a run-time construct than a feature meant to be used directly by programmers. Finally, we note that the boundary of a cr-free term is an **x** boundary for any **x** ∈ {HOSC, GOSC, HOS, GOS}. Thus, we can consider approximation between cr-terms for any **x** from the range, i.e. the notions **<sup>x</sup>** err , **<sup>x</sup>** ter are all applicable. Consequently, cr-free terms provide a common setting in which the discriminating power of HOSC, GOSC, HOS and GOS contexts can be compared. We discuss the scope for extending our results outside of the cr-free fragment, and for richer type systems, in Section 7.

# **3 HOSC[HOSC]**

Recall that HOSC err concerns testing HOSC terms with HOSC contexts. Accordingly, we call this case HOSC[HOSC]. For cont <sup>σ</sup>(K)-free terms, we show that HOSC err and HOSC ter coincide, which follows from the lemma below.

**Lemma 2.** Let Γ M1, M<sup>2</sup> be HOSC terms not containing any occurrences of cont <sup>τ</sup> (K).


In what follows, after introducing several preliminary notions, we shall design a labelled transition system (LTS) whose traces will turn out to capture contextual interactions involved in testing cr-free terms according to HOSC err . This will enable us to capture HOSC err via trace inclusion. Actions of the LTS will refer to functions and continuations in a symbolic way, using typed names.

### **3.1 Names and abstract values**

**Definition 7.** Let FNames = ' σ,σ- FNames<sup>σ</sup>→σ be the set of **function names**, partitioned into mutually disjoint countably infinite sets FNames<sup>σ</sup>→σ- . We will use f,g to range over FNames and write f : σ → σ for f ∈ FNames<sup>σ</sup>→σ-.

Analogously, let CNames = ' <sup>σ</sup> CNames<sup>σ</sup> be the set of **continuation names**. We will use c, d to range over CNames, and write c : σ for c ∈ CNamesσ. Note that the constants represent continuations, so the "real" type of c is cont σ, but we write c : σ for the sake of brevity. We assume that CNames, FNames are disjoint and let Names = FNames 4 CNames. Elements of Names will be weaved into various constructions in the paper, e.g. terms, heaps, etc. We will then write ν(X) to refer to the set of names used in some entity X.

Because of the shape of boundary types in cr-free terms and, in particular, the presence of product types, the values that will be exchanged between the context and the program take the form of tuples consisting of (), integers, booleans and functions. To describe such scenarios, we introduce the notion of *abstract values*, which are patterns that match such values. Abstract values are generated by the grammar

A, B () <sup>|</sup> **tt** <sup>|</sup> **ff** <sup>|</sup> <sup>n</sup> <sup>|</sup> <sup>f</sup> | A, B

with the proviso that, in any abstract value, a name may occur at most once. As function names are intrinsically typed, we can assign types to abstract values in the obvious way, writing A : τ .

# **3.2 Actions and traces**

Our LTS will be based on four kinds of actions, listed below. Each action will be equipped with a *polarity*, which is either Player (P) or Opponent (O). P-actions describing interaction steps made by a tested term, while O-actions involve the context.


In what follows, **a** is used to range over actions. We will say that a name is *introduced* by an action **a** if it is sent or received in **a**. If **a** is an O-action (resp. P-action), we say that the name was introduced by O (resp. P). An action **a** is *justified* by another action **a** if the name that **a** uses to communicate, i.e. f in questions ( ¯f(A, c), f(A, c)) and c in answers (¯c(A), c(A)), has been introduced by **a** .

We will work with sequences of actions of a very special shape, specified below. The definition assumes two given sets of names, N<sup>P</sup> and NO, which represent names that have already been introduced by P and O respectively.

**Definition 8.** Let NO, N<sup>P</sup> ⊆ Names. An (NO, N<sup>P</sup> )-**trace** is a sequence t of actions such that:

	- **<sup>a</sup>** <sup>=</sup> ¯f(A, c) (<sup>f</sup> <sup>∈</sup> <sup>N</sup>O) or **<sup>a</sup>** = ¯c(A) (<sup>c</sup> <sup>∈</sup> <sup>N</sup>O) or **<sup>a</sup>** <sup>=</sup> <sup>f</sup>(A, c) (<sup>f</sup> <sup>∈</sup> <sup>N</sup><sup>P</sup> ) or **a** = c(A) (c ∈ N<sup>P</sup> ) or
	- the name has been introduced by an earlier action **a** of opposite polarity.

Note that, due to the shape of actions, a continuation name can only be introduced/justified by a question. Moreover, because names are never introduced twice, if **a** justifies **a** then **a** is uniquely determined in a given trace. Readers familiar with game semantics will recognize that traces are very similar to alternating justified sequences except that traces need not be started by O.

Example 1. Let (NO, N<sup>P</sup> )=({c}, ∅) where c : τ = ((Unit → Unit) → Unit) × (Unit → Int). Then the following sequence is an (NO, N<sup>P</sup> )-trace:

$$\mathbf{t}\_1 = \bar{c}(\langle g\_1, g\_2 \rangle) \ g\_1(f\_1, c\_1) \ \bar{f}\_1(\langle ), c\_2) \ c\_2(\langle)) \ c\_1(\langle)) \ c\_2(\langle)) \ c\_1(\langle)) \ g\_2(\langle), c\_3) \ \bar{c}\_3(2)$$

where g<sup>1</sup> : (Unit → Unit) → Unit, g<sup>2</sup> : Unit → Int, f<sup>1</sup> : Unit → Unit, c1, c<sup>2</sup> : Unit, c<sup>3</sup> : Int.

### **3.3 Extended syntax and reduction**

We extend the definition of HOSC presented in Figure 2 to take into account these names. We refine the operational reduction using continuation names to keep track of the toplevel continuation. We list all the changes below.

**–** Function names are added to the syntax as constants. Since they are meant to represent values, they are also considered to be syntactic values in the extended language.

$$\frac{f \in \text{FNames}\_{\sigma \to \sigma'}}{\Sigma; \Gamma \vdash f : \sigma \to \sigma'}$$

**–** Continuation names are not terms on their own. Instead, they are built into the syntax via a new construct cont<sup>σ</sup> (K, c), subject to the following typing rule.

$$\frac{\Sigma; \varGamma \vdash K : \sigma \to \sigma' \quad c \in \text{CNmmes}\_{\sigma'}}{\Sigma; \varGamma \vdash \text{cont}\_{\sigma}\left(K, c\right) : \text{cont } \sigma}$$

cont<sup>σ</sup> (K, c) is a staged continuation that first evaluates terms inside K and, if this produces a value, the value is passed to c. This operational meaning will be implemented through a suitable reduction rule, to be discussed next. cont<sup>σ</sup> (K, c) is also regarded as a value. Note that we remove the old construct cont<sup>σ</sup> K from the extended syntax.

**–** The operational semantics → underpinning the LTS is based on triples (M, c, h) such that Σ; Γ M : σ, c ∈ CNames<sup>σ</sup> and h : Σ. The continuation name c is used to represent the surrounding context, which is left abstract. The previous operational rules → are embedded into the new reduction → using the rule below.

$$\frac{(M,h)\to(M',h')}{(M,c,h)\to(M',c,h')}$$

The two reduction rules related to continuations, previously used to define →, are not included. Instead we use the following rules, which take advantage of the extended syntax.

$$\begin{array}{c} (K[\text{call}/\text{cc}\_{\tau}(x.M)], c, h) \to (K[M[\text{cont}\_{\tau}(K, c)/x]], c, h) \\ (K[\text{throw}\_{\tau} \text{ V to } \text{cont}\_{\tau}(K', c')], c, h) \to (K'[V], c', h) \end{array}$$

# **3.4 Configurations**

We write Vals for the extended set of syntactic values, i.e. FNames ⊆ Vals. Let ECtxs stand for the set of extended evaluation contexts, defined as K in Figure 1 taking the extended definition of values into account. Before defining the transition relation of our LTS, we discuss the shape of configurations, providing intuitions behind each component.

*Passive configurations* take the form γ, ξ, φ, h and are meant to represent stages at which the environment is to make a move.


The components satisfy healthiness conditions, implied by their role in the system. Let Σ = dom(h).


*Active configurations* take the form M, c, γ, ξ, φ, h and represent interaction steps of the term. The γ, ξ, φ, h components have already been described above. For M and c, given Σ = dom(h), we will have Σ; ∅ M : σ, c ∈ CNames<sup>σ</sup> and ν(M) ∪ {c} ⊆ φ \ dom(γ).

### **3.5 Transitions**

Observe that any closed value V of a cont- and ref-free type σ can be decomposed into an abstract value A (pattern) and the corresponding substitution γ (matching). The set of all such decompositions, written **AVal**σ(V ), is defined below. Given a value V of a (cr-free) type σ, **AVal**σ(V ) contains all pairs (A, γ) such that A is an abstract value and γ : ν(A) → Vals is a substitution such that A{γ} = V . More concretely,

$$\begin{array}{lcl} \mathbf{Val}\_{\sigma}(V) & \triangleq \{(V, \emptyset)\} & \text{for } \sigma \in \{\text{Unit}, \text{Bool}, \text{Int}\} \\ \mathbf{AVal}\_{\sigma \to \sigma'}(V) & \triangleq \{(f, [f \mapsto V]) \mid f \in \text{FNames}\_{\sigma \to \sigma'}\} \\ \mathbf{AVal}\_{\sigma \times \sigma'}(\langle U, V \rangle) & \triangleq \{(\langle A\_{1}, A\_{2} \rangle, \gamma\_{1} \cdot \gamma\_{2}) \mid \\ & (A\_{1}, \gamma\_{1}) \in \mathbf{AVal}\_{\sigma}(U), \ (A\_{2}, \gamma\_{2}) \in \mathbf{AVal}\_{\sigma'}(V) \} \end{array}$$

Note that, by writing ·, we mean to implicitly require that the function domains be disjoint. Similarly, when writing 4, we stipulate that the argument sets be disjoint.

Example 2. Let <sup>σ</sup> = (Int <sup>→</sup> Bool) <sup>×</sup> (Int <sup>×</sup> (Unit <sup>→</sup> Int)) and <sup>V</sup> ≡ λxInt.x <sup>=</sup> <sup>1</sup>,2, λxUnit.3. Then **AVal**σ(<sup>V</sup> ) equals

$$\{ (\langle f, \langle 2, g \rangle \rangle, [f \mapsto (\lambda x^{\text{Int}}. x \neq 1)] \cdot [g \mapsto (\lambda x^{\text{Unit}}. 3)] \} \mid$$

$$f \in \text{FNames}\_{\text{Int} \to \text{Unit}}, g \in \text{FNames}\_{\text{Unit} \to \text{Int}}.$$

Finally, we present the transitions of what we call the HOSC[HOSC] LTS in Figure 3.

Example 3. Below we analyse the (PQ) rule in more detail.

$$\{K[fV], c, \gamma, \xi, \phi, h\} \xrightarrow{\vec{f}(A, c')} \langle \gamma \cdot \gamma' \cdot [c' \mapsto K], \xi \cdot [c' \mapsto c], \phi \uplus \nu(A) \uplus \{c'\}, h\rangle$$
 when  $f: \sigma \to \sigma'$ ,  $(A, \gamma') \in \mathbf{AVal}\_{\sigma}(V)$  and  $c': \sigma'$ 

The use of 4 in φ 4 ν(A) 4 {c } is meant to highlight the requirement that the names introduced in ¯f(A, c ), i.e. ν(A)∪{c }, should be fresh and disjoint from φ. Moreover, note how γ and ξ are updated. In general, γ, ξ, h are updated during P-actions.

**Definition 9.** Given two configurations **C**, **C** , we write **<sup>C</sup> <sup>a</sup>** <sup>=</sup><sup>⇒</sup> **<sup>C</sup>** if **<sup>C</sup>** <sup>τ</sup> −→∗ **<sup>C</sup> <sup>a</sup>** −→ **C** , with <sup>τ</sup> −→∗ representing multiple (possibly none) τ -actions. This notation is extended to sequences of actions: given t = **a**<sup>1</sup> ... **a**n, we write **C** <sup>t</sup> =⇒ **C** , if there exist **<sup>C</sup>**1,..., **<sup>C</sup>**<sup>n</sup>−<sup>1</sup> such that **<sup>C</sup> <sup>a</sup>**<sup>1</sup> ==<sup>⇒</sup> **<sup>C</sup>**<sup>1</sup> ··· **<sup>C</sup>**<sup>n</sup>−<sup>1</sup> **<sup>a</sup>**<sup>n</sup> ==<sup>⇒</sup> **<sup>C</sup>** . We define **Tr**HOSC(**C**) = {<sup>t</sup> <sup>|</sup> there exists **<sup>C</sup>** such that **<sup>C</sup>** <sup>t</sup> =⇒ **C** }.

**Lemma 3.** Suppose **C** = γ, ξ, φ, h or **C** = M, c, γ, ξ, φ, h are configurations. Then elements of **Tr**HOSC(**C**) are (φ \ dom(γ), dom(γ))-traces.

(P τ ) M, c, γ, ξ, φ, h <sup>τ</sup> −−→ N, c- , γ, ξ, φ, h- when (M, c, h) → (N, c- , h- ) (P A) V, c, γ, ξ, φ, h <sup>c</sup>¯(A) −−−→ <sup>γ</sup> · <sup>γ</sup>- , ξ, φ ' ν(A), h when c : σ, (A, γ- ) ∈ **AVal**σ(V ) (P Q) K[fV ], c, γ, ξ, φ, h <sup>f</sup> ¯(A,c-) −−−−−→ γ · γ- · [c- → K], ξ · [c- → c], φ ' ν(A) ' {c- }, h when f : σ → σ- , (A, γ- ) ∈ **AVal**σ(V ), c- : σ- (OA) γ, ξ, φ, h <sup>c</sup>(A) −−−→ K[A], c- , γ, ξ, φ ' ν(A), h when c : σ, A : σ, γ(c) = K, ξ(c) = c- (OQ) γ, ξ, φ, h <sup>f</sup>(A,c) −−−−→ V A, c, γ, ξ, φ ' <sup>ν</sup>(A) ' {c}, h when f : σ → σ- , A : σ, c : σ- , γ(f) = V

NB c : σ stands for c ∈ CNamesσ.

### **Fig. 3.** HOSC[HOSC] LTS

$$\begin{array}{l} \begin{array}{lcll} M\_1^{cwl}: \text{let x } = \text{ref 0 in} \\ \text{let b } = \text{ref ff } \text{in} \\ \langle \lambda \mathbf{f}, \text{if } \neg \text{(!b) then} \\ \mathbf{b } := \mathbf{t} \mathbf{t}; \mathbf{f}(\rangle; \mathbf{x} := \mathbf{l} \mathbf{x} + \mathbf{l}; \\ \mathbf{b } := \mathbf{f} \mathbf{f}; \\ \mathbf{b } := \mathbf{f} \mathbf{f}; \\ \end{array} \end{array} \qquad \begin{array}{l} \begin{array}{l} M\_2^{cwl}: \text{let x } = \text{ref 0 in} \\ \text{let b } = \text{ref 1 in} \\ \langle \lambda \mathbf{f}, \text{if } \neg \text{(!b)} \text{ then} \\ \mathbf{b } := \mathbf{t} \mathbf{t}; \text{let n } = \text{!x in } \mathbf{f} \langle \rangle; \mathbf{x } := \mathbf{n} + 1; \\ \mathbf{b } := \mathbf{f} \mathbf{f}; \\ \qquad \text{else (), } \lambda .. \text{!Unit.} \mathbf{x} \rangle \end{array}$$

**Fig. 4.** Callback-with-lock Example [4]

Example 4. In Figure 5, we show that the trace from Example 1 is generated by the configuration **<sup>C</sup>** Mcwl <sup>1</sup> , c, <sup>∅</sup>, <sup>∅</sup>, {c}, ∅, where <sup>M</sup>cwl <sup>1</sup> is given in Figure 4. We write inc λf.if <sup>¬</sup>(!b) (<sup>b</sup> := **tt**; <sup>f</sup>(); <sup>x</sup> :=!<sup>x</sup> + 1; <sup>b</sup> := **ff**) (), get <sup>λ</sup> .!<sup>x</sup> and c : ((Unit → Unit) → Unit) × (Unit → Int). It is interesting to notice that in this interaction, Opponent uses the continuation N twice, incrementing the counter x by two. The second time, it does it without having to call inc again, but rather by using the continuation name c2.

Remark 2. Due to the freedom of name choice, note that **Tr**HOSC(**C**) is closed under type-preserving renamings that preserve names from **C**.

### **3.6 Correctness and full abstraction**

We define two kinds of special configurations that will play an important role in spelling out correctness results for the HOSC[HOSC] LTS. Let Γ = {x<sup>1</sup> : σ1, ··· , x<sup>k</sup> : σk}. A map ρ from {x1, ··· , xk} to the set of abstract values will be called a Γ*-assignment* provided, for all 1 ≤ i = j ≤ k, we have ρ(xi) : σ<sup>i</sup> and ν(ρ(xi)) ∩ ν(ρ(x<sup>j</sup> )) = ∅.

**<sup>C</sup>** <sup>=</sup> Mcwl <sup>1</sup> , c, <sup>∅</sup>, <sup>∅</sup>, {c}, ∅ <sup>τ</sup><sup>∗</sup> −→ inc, get, c, <sup>∅</sup>, <sup>∅</sup>, {c}, [<sup>b</sup> → **ff**, <sup>x</sup> → 0] <sup>c</sup>¯( <sup>g</sup>1,g2 ) −−−−−−→ γ1, <sup>∅</sup>, {c, g1, g2}, [<sup>b</sup> → **ff**, <sup>x</sup> → 0] with <sup>γ</sup><sup>1</sup> = [g<sup>1</sup> → inc, g<sup>2</sup> → get], <sup>g</sup>1(f1,c1) −−−−−−→ incf1, c1, γ1, <sup>∅</sup>, φ2, [<sup>b</sup> → **ff**, <sup>x</sup> → 0] with <sup>φ</sup><sup>2</sup> <sup>=</sup> {c, g1, g2, f1, c1} <sup>τ</sup><sup>∗</sup> −→ f1(); N, c1, γ1, ∅, φ2, [<sup>b</sup> → **tt**, <sup>x</sup> → 0] with N = <sup>x</sup> :=!<sup>x</sup> + 1; <sup>b</sup> := **ff** f¯ <sup>1</sup>((),c2) −−−−−→ γ2, ξ, φ3, [<sup>b</sup> → **tt**, <sup>x</sup> → 0] with <sup>γ</sup><sup>2</sup> <sup>=</sup> <sup>γ</sup><sup>1</sup> · [c<sup>2</sup> → •; <sup>N</sup>], <sup>c</sup>2(()) −−−−→ (); N, c1, γ2, ξ, φ3, [<sup>b</sup> → **tt**, <sup>x</sup> → 0] <sup>ξ</sup> = [c<sup>2</sup> → <sup>c</sup>1] and <sup>φ</sup><sup>3</sup> <sup>=</sup> <sup>φ</sup><sup>2</sup> ' {c2} <sup>τ</sup><sup>∗</sup> −→ (), c1, γ2, ξ, φ3, [<sup>b</sup> → **ff**, <sup>x</sup> → 1] <sup>c</sup>¯1(()) −−−−→ γ2, ξ, φ3, [<sup>b</sup> → **ff**, <sup>x</sup> → 1] <sup>c</sup>2(()) −−−−→ (); N, c1, γ2, ξ, φ3, [<sup>b</sup> → **ff**, <sup>x</sup> → 1] <sup>τ</sup><sup>∗</sup> −→ (), c1, γ2, ξ, φ3, [<sup>b</sup> → **ff**, <sup>x</sup> → 2] <sup>c</sup>¯1(()) −−−−→ γ2, ξ, φ3, [<sup>b</sup> → **ff**, <sup>x</sup> → 2] <sup>g</sup>2((),c3) −−−−−→ get(), c3, γ2, ξ, φ4, [<sup>b</sup> → **ff**, <sup>x</sup> → 2] with <sup>φ</sup><sup>4</sup> <sup>=</sup> <sup>φ</sup><sup>3</sup> ' {c3} <sup>τ</sup><sup>∗</sup> −→ 2, c3, γ2, ξ, φ4, [<sup>b</sup> → **ff**, <sup>x</sup> → 2] <sup>c</sup>¯3(2) −−−→ γ2, ξ, φ4, [<sup>b</sup> → **ff**, <sup>x</sup> → 2]

**Fig. 5.** Trace derivation in the HOSC[HOSC] LTS

**Definition 10 (Program configuration).** Given a Γ-assignment ρ, a cr-free HOSC term <sup>Γ</sup> <sup>M</sup> : <sup>τ</sup> and <sup>c</sup> : <sup>τ</sup> , we define the active configuration <sup>C</sup>ρ,c <sup>M</sup> by Cρ,c <sup>M</sup> = M{ρ}, c, ∅, ∅, ν(ρ) ∪ {c}, ∅.

Note that traces from **Tr**HOSC(Cρ,c <sup>M</sup> ) will be (ν(ρ) ∪ {c}, ∅)-traces.

**Definition 11.** The HOSC[HOSC] **trace semantics** of a cr-free HOSC term Γ M : τ is defined to be

**Tr**HOSC(<sup>Γ</sup> <sup>M</sup> : <sup>τ</sup> ) = {((ρ, c), t)<sup>|</sup> <sup>ρ</sup> is a <sup>Γ</sup>-assignment, c : τ, t <sup>∈</sup> **Tr**HOSC(Cρ,c <sup>M</sup> )}.

Example 5. Recall the term <sup>M</sup>cwl <sup>1</sup> : τ from Example 4, the trace t<sup>1</sup> and the configuration **<sup>C</sup>** such that <sup>t</sup><sup>1</sup> <sup>∈</sup> **Tr**HOSC(**C**). Because <sup>M</sup>cwl <sup>1</sup> is closed (Γ = ∅), the only <sup>Γ</sup>-assignment is the empty map <sup>∅</sup>. Thus, **<sup>C</sup>** <sup>=</sup> <sup>C</sup><sup>∅</sup>,c Mcwl 1 , so ((∅, c), t1) ∈ **Tr**HOSC( <sup>M</sup>cwl <sup>1</sup> : τ ).

Having defined active configurations associated with terms, we now define passive configurations associated with contexts. Let us fix / ∈ FNamesUnit→Unit and, for each σ, a continuation name ◦<sup>σ</sup> ∈ CNamesσ. Let ◦ = - σ{◦σ}. Intuitively, the names / will correspond to ⇓err and ◦<sup>σ</sup> to ⇓ter .

Recall that ˆerr stands for err : Unit → Unit. Given a heap h : Σ; ˆerr , an evaluation context Σ; ˆerr K : τ → τ and a substitution Σ; ˆerr γ : Γ (as in the definition of HOSC(ciu) err ), let us replace every occurrence of cont<sup>σ</sup> K inside h, K, γ with cont<sup>σ</sup> (K , ◦<sup>σ</sup>- ), if K has type σ → σ . Moreover, let us replace every occurrence of the variable err with the function name /. This is done to adjust h, K, γ to the extended syntax of the LTS: the upgraded versions are called h◦, γ◦, K◦.

Next we define the set **AVal**<sup>Γ</sup> (γ) of all disjoint decompositions of values from γ◦ into abstract values and the corresponding matchings. Recall that Γ = {x<sup>1</sup> : <sup>σ</sup>1, ··· , x<sup>k</sup> : <sup>σ</sup>k}. Below A<sup>i</sup> stands for (A1, ··· , Ak), and γ<sup>i</sup> for (γ1, ··· , γk).

$$\mathbf{AVal}\_{\Gamma}(\gamma) = \{ \begin{array}{l} (\vec{A\_i}, \vec{\gamma\_i}) \mid (A\_i, \gamma\_i) \in \mathbf{AVal}\_{\sigma\_i}(\gamma\_\diamond(x\_i)), \ i = 1, \cdots, k; \\ \nu(A\_1), \cdots, \nu(A\_k) \text{ mutually disjoint and without } \diamondsuit \} \end{array}$$

**Definition 12 (Context configuration).** Given Σ, h : Σ; ˆerr , Σ; ˆerr K : τ → τ , <sup>Σ</sup>; ˆerr <sup>γ</sup> : <sup>Γ</sup>, (Ai, γi) <sup>∈</sup> **AVal**<sup>Γ</sup> (γ) and <sup>c</sup> : <sup>τ</sup> (<sup>c</sup> ∈ ◦), the corresponding configuration Cγi,c h,K,γ is defined by

$$\mathfrak{C}\_{h,K,\gamma}^{\vec{\gamma},\mathsf{c}} = \langle \bigoplus\_{i=1}^{k} \gamma\_i \uplus \{c \mapsto K\_{\diamond}\}, \{c \mapsto \circ\_{\tau'}\}, \bigoplus\_{i=1}^{k} \nu(A\_i) \uplus \{c\} \uplus \circ \uplus \{\diamond\}, h\_{\diamond} \rangle.$$

Intuitively, the names ν(Ai) correspond to calling function values extracted from γ, whereas c corresponds to K. Note that traces in **Tr**HOSC(Cγi,c h,K,γ) will be (◦ 4 {/}, 'k <sup>i</sup>=1 ν(Ai) 4 {c})-traces.

In preparation for the next result, we introduce the following shorthands.


**Lemma 4 (Correctness).** Let Γ M : τ be a cr-free HOSC term, let Σ, h, K, γ be as above, (Ai, γi) <sup>∈</sup> **AVal**<sup>Γ</sup> (γ), and <sup>c</sup> : <sup>τ</sup> (<sup>c</sup> ∈ ◦). Then


Moreover, t satisfies ν(t) ∩ (◦ ∪ {/}) = ∅.

Intuitively, the lemma above confirms that the potential of a term to converge is determined by its traces. Accordingly, we have:

**Theorem 1 (Soundness).** For any cr-free HOSC terms Γ M1, M2, if **Tr**HOSC(<sup>Γ</sup> <sup>M</sup>1) <sup>⊆</sup> **Tr**HOSC(<sup>Γ</sup> <sup>M</sup>2) then <sup>Γ</sup> <sup>M</sup><sup>1</sup> HOSC(ciu) err <sup>M</sup>2.

To prove the converse, we need to know that every odd-length trace generated by a term actually participates in a contextual interaction. This will follow from the lemma below. Note that ⇓err relies on even-length traces from the context (Lemma 4).

**Lemma 5 (Definability).** Suppose φ 4 {/} ⊆ FNames and t is an even-length (◦ 4 {/}, φ4 {c})-trace starting with an O-action. There exists a passive configuration **C** such that the even-length traces **Tr**HOSC(**C**) are exactly the even-length prefixes of t (along with all renamings that preserve types and φ 4 {c} 4 ◦ 4 {/}, cf. Remark 2). Moreover, **C** = γ◦ · [c → K◦], {c → ◦<sup>τ</sup>-}, φ 4 {c} 4 ◦ 4 {/}, h◦, where h, K, γ are built from HOSC syntax.

Proof (Sketch). The basic idea is to use references in order to record all continuation and function names introduced by the environment. For continuations, the use of call/cc<sup>τ</sup> is essential. Once stored in the heap, the names can be accessed by terms when needed in P-actions. The availability of throw and references to all O-continuations means that arbitrary answer actions can be scheduled when needed.

**Theorem 2 (Completeness).** For any cr-free HOSC terms Γ M1, M2, Γ <sup>M</sup><sup>1</sup> HOSC(ciu) err <sup>M</sup><sup>2</sup> implies **Tr**HOSC(<sup>Γ</sup> <sup>M</sup>1) <sup>⊆</sup> **Tr**HOSC(<sup>Γ</sup> <sup>M</sup>2).

Theorems 1, 2 (along with Lemmas 1, 2) imply the following full abstraction results.

**Corollary 1 (HOSC Full Abstraction).** Suppose Γ M1, M<sup>2</sup> are cr-free HOSC terms. Then **Tr**HOSC(<sup>Γ</sup> <sup>M</sup>1) <sup>⊆</sup> **Tr**HOSC(<sup>Γ</sup> <sup>M</sup>2) iff <sup>Γ</sup> <sup>M</sup><sup>1</sup> HOSC err <sup>M</sup><sup>2</sup> iff <sup>Γ</sup> <sup>M</sup><sup>1</sup> HOSC ter M2.

Example 6 (Callback with lock [4]). Recall the term <sup>M</sup>cwl <sup>1</sup> : ((Unit → Unit) → Unit)×(Unit → Int) from Example 4, given in Figure 4. We had t<sup>1</sup> = ¯c(g1, g2) <sup>g</sup>1(f1, c1) ¯f1((), c2) <sup>c</sup>2(()) ¯c1(()) <sup>c</sup>2(()) ¯c1(()) <sup>g</sup>2((), c3) ¯c3(2) <sup>∈</sup> **Tr**HOSC(C∅,c Mcwl 1 ).

Define t<sup>2</sup> to be t<sup>1</sup> except that its last action ¯c3(2) is replaced with ¯c3(1). Observe that <sup>t</sup><sup>1</sup> <sup>∈</sup> **Tr**HOSC(C∅,c Mcwl 1 ) \ **Tr**HOSC(C∅,c Mcwl 2 ) and <sup>t</sup><sup>2</sup> <sup>∈</sup> **Tr**HOSC(C∅,c Mcwl 2 ) \ **Tr**HOSC(C<sup>∅</sup>,c Mcwl 1 ), i.e. by the Corollary above the terms are incomparable wrt HOSC err . However, they are equivalent wrt **<sup>x</sup>** err for **x** ∈ {GOSC, HOS, GOS} [8].

The above Corollary also provides a handle to reason about equivalence via trace equivalence. Sometimes this can be done directly on the LTS, especially when γ can be kept bounded.

Example 7 (Counter [28]). For i ∈ {1, 2}, consider the terms M<sup>i</sup> : (Unit → Unit) × (Unit → Int) given by M<sup>i</sup> ≡ let x = ref 0 in inci, geti, where inc<sup>1</sup> ≡ (λy.x :=!x+ 1), inc<sup>2</sup> ≡ (λy.x :=!x−1), get<sup>1</sup> ≡ λz.!x, get<sup>2</sup> ≡ λz.−!x. In this case, **Tr**HOSC(C<sup>∅</sup>,c Mi ) contains (prefixes of) traces of the form ¯c(g, h)t, where t is built from segments of two kinds: either g((), ci) ¯ci(()) or h((), c <sup>i</sup>) ¯c <sup>i</sup>(n), where the cis and c <sup>i</sup>s are pairwise different. Moreover, in the latter case, n must be equal to the number of preceding actions of the form g((), ci). For this example, trace equality could be established by induction on the length of trace. Consequently, M<sup>1</sup> ∼=HOSC err M2.

# **4 GOSC[HOSC]**

Recall that GOSC is the fragment of HOSC in which general storage is restricted to values of ground type, i.e. arithmetic/boolean constants, the associated reference names, references to those names and so on. In what follows, we are going to provide characterizations of GOSC err via trace inclusion. Recall that, by Lemma 2, GOSC err =GOSC ter . Note that we work in an asymmetric setting with terms belonging to HOSC being more powerful than contexts.

We start off by identifying several technical consequences of the restriction to GOSC syntax. First we observe that GOSC internal reductions never contribute extra names.

**Lemma 6.** Suppose (M, c, h) → (M , c , h ), where M is a GOSC term and h is a GOSC heap. Then ν(M) ∪ {c} ⊇ ν(M ) ∪ {c }.

Proof. By case analysis. All defining rules for →, with the exception of the (K[!], h) → (K[h()], h) rule, are easily seen to satisfy the Lemma (no function or continuation names are added). However, if the heap is restricted to storing elements of type ι (as in GOSC) then h() will never contain a name, so the Lemma follows.

The lemma has interesting consequences for the shape of traces generated by the context configurations Cγi,c h,K,γ if they are built from GOSC syntax. Recall that P-actions have the form ¯f(A, c ) or ¯c(A), where f, c are names introduced by O. It turns out that when h, K, γ are restricted to GOSC, more can be said about the origin of the names in traces generated by Cγi,c h,K,γ: they will turn out to come from a restricted set of names introduced by O, which we identify below. The definition below is based on following the justification structure of a trace – recall that one action is said to justify another if the former introduces a name that is used for communication in the latter.

**Definition 13.** Suppose φ 4 {/} ⊆ FNames and c ∈ CNames. Let t be an oddlength (◦ 4 {/}, φ 4 {c})-trace starting with an O-action. The set Vis<sup>P</sup> (t) of Pvisible names of t is defined as follows.

$$\begin{array}{c} \mathsf{Vis}\_{P}(t \ c'(A')) = \{\diamond\} \cup \circ \cup \nu(A')\\ \mathsf{Vis}\_{P}(t \ \bar{f}''(A'', c') \ t' \ c'(A')) = \mathsf{Vis}\_{P}(t) \cup \nu(A')\\ \mathsf{Vis}\_{P}(t \ \bar{f}'(A', c')) = \{\diamond\} \cup \circ \cup \nu(A') \cup \{c'\}\\ \mathsf{Vis}\_{P}(t \ \bar{f}''(A'', c'') \ t' \ f'(A', c')) = \mathsf{Vis}\_{P}(t) \cup \nu(A') \cup \{c'\}\\ \mathsf{Vis}\_{P}(t \ \bar{c}''(A'') \ t' \ f'(A', c')) = \mathsf{Vis}\_{P}(t) \cup \nu(A') \cup \{c'\} \end{array} \qquad \begin{array}{c} \mathsf{c}' = \mathsf{c} \\ \qquad \mathsf{c}' \neq \mathsf{c} \\ \qquad \mathsf{Vis}\_{P}(t \ \bar{\mathsf{V}}(A', c')) = \mathsf{Vis}\_{P}(t) \cup \nu(A') \cup \{c'\} \\ \qquad \qquad \qquad f' \in \nu(A'')\\ \mathsf{Vis}\_{P}(t \ \bar{\mathsf{V}}(A'', c')) = \mathsf{Vis}\_{P}(t) \cup \nu(A') \cup \{c'\} \end{array}$$

Note that, in the inductive cases, the definition follows links between names introduced by P and the point of their introduction, names introduced inbetween are ignored. Here readers familiar with game semantics will notice similarity to the notion of P-view [12].

Next we specify a property of traces that will turn out to be satisfied by configurations corresponding to GOSC contexts.

**Definition 14.** Suppose φ 4 {/} ⊆ FNames and c ∈ CNames. Let t be a (◦ 4 {/}, φ 4 {c})-trace starting with an O-action. t is called **P-visible** if

**–** for any even-length prefix t ¯f(A, c) of <sup>t</sup>, we have <sup>f</sup> <sup>∈</sup> Vis<sup>P</sup> (<sup>t</sup> ),

**–** for any even-length prefix t c¯(A) of t, we have c ∈ Vis<sup>P</sup> (t ).

**Lemma 7.** Consider **C** = Cγi,c h,K,γ, where h, K, γ are from GOSC and (Ai, γi) <sup>∈</sup> **AVal**<sup>Γ</sup> (γ). Then all traces in **Tr**HOSC(**C**) are P-visible.

The Lemma above shows that contextual interactions with GOSC contexts rely on restricted traces. We shall now modify the HOSC[HOSC] LTS to capture the restriction. Note that, from the perspective of the term, the above constraint is a constraint on the use of names by O (context), so we need to talk about O-available names instead. This dual notion is defined below.

**Definition 15.** Suppose φ ⊆ FNames and c ∈ CNames. Let t be a (φ 4 {c}, ∅) trace of odd length. The set VisO(t) of **O-visible names** of t is defined as follows.

$$\begin{array}{c} \mathsf{Vis}\_{O}(t \; \bar{c}'(A')) = \nu(A') & c' = c\\ \mathsf{Vis}\_{O}(t \; \int^{\prime\prime} (A'', c') \; t' \; \bar{c}'(A')) = \mathsf{Vis}\_{O}(t) \cup \nu(A') & c' \neq c\\ \mathsf{Vis}\_{O}(t \; \bar{f}'(A', c')) = \nu(A') \cup \{c'\} & f' \in \phi\\ \mathsf{Vis}\_{O}(t \; \int^{\prime\prime} f''(A'', c')) = \mathsf{Vis}\_{O}(t) \cup \nu(A') \cup \{c'\} & f' \in \nu(A'')\\ \mathsf{Vis}\_{O}(t \; \bar{c}'(A'') \; t' \; \bar{f}'(A', c')) = \mathsf{Vis}\_{O}(t) \cup \nu(A') \cup \{c'\} & f' \in \nu(A'') \end{array}$$

Analogously, a (φ 4 {c}, ∅)-trace t is **O-visible** if, for any even-length prefix t f(A, c) of t, we have f ∈ VisO(t ) and, for any even-length prefix t c(A) of t, we have c ∈ VisO(t ).

Example 8. Recall the trace

$$\mathbf{t}\_1 = \bar{c}(\langle g\_1, g\_2 \rangle) \ g\_1(f\_1, c\_1) \ \bar{f}\_1(\langle ), c\_2) \ c\_2(\langle ) \ \bar{c}\_1(\langle ) \text{ } c\_2(\langle ) \text{ } \bar{c}\_1(\langle ) \text{ } g\_2(\langle ), c\_3) \ \bar{c}\_3(2 \rangle)$$

from previous examples. Observe that

$$\begin{array}{c} \mathsf{Vis}\_O(\bar{c}(\langle g\_1, g\_2 \rangle) \lrcorner g\_1(f\_1, c\_1) \, \bar{f}\_1(\langle ), c\_2)) = \{g\_1, g\_2, c\_2\} \\ \mathsf{Vis}\_O(\bar{c}(\langle g\_1, g\_2 \rangle) \, g\_1(f\_1, c\_1) \, \bar{f}\_1(\langle ), c\_2) \, c\_2(()) \, \bar{c}\_1(())) = \{g\_1, g\_2\} \end{array}$$

Consequently, the first use of c2(()) in t<sup>1</sup> does not violate O-visibility, but the second one does.

In Figure 6, we present a new LTS, called the GOSC[HOSC] LTS, which will turn out to capture GOSC err through trace inclusion. It is obtained from the HOSC[HOSC] LTS by restricting O-actions to those that rely on O-visible names. Technically, this is done by enriching configurations with an additional component F, which maintains historical information about O-available names immediately before each O-action. After each P-action, F is accessed to calculate the current set V of O-available names according to the definition of O-availability and only O-actions compatible with O-availability are allowed to proceed (due

(P τ ) M, c, γ, ξ, φ, h, F <sup>τ</sup> −−→ N, c- , γ, ξ, φ, h- , F when (M, c, h) → (N, c- , h- ) (P A) V, c, γ, ξ, φ, h, F <sup>c</sup>¯(A) −−−→ <sup>γ</sup> · <sup>γ</sup>- , ξ, φ ' ν(A), h, F, F(c) ' ν(A) when c : σ and (A, γ- ) ∈ **AVal**σ(V ) (P Q) K[fV ], c, γ, ξ, φ, h, F <sup>f</sup> ¯(A,c-) −−−−−→ γ · γ- · [c- → K], ξ · [c- → c], φ ' φ- , h, F, F(f) ' φ- when f : σ → σ- , (A, γ- ) ∈ **AVal**σ(V ), c- : σ and φ- = ν(A) ' {c- } (OA) γ, ξ, φ, h, <sup>F</sup>, V <sup>c</sup>(A) −−−→ K[A], c- , γ, ξ, φ ' ν(A), h, F · [ν(A) → V] when c ∈ V, c : σ, A : σ, γ(c) = K, ξ(c) = c- (OQ) γ, ξ, φ, h, <sup>F</sup>, V <sup>f</sup>(A,c) −−−−→ V A, c, γ, ξ, φ ' <sup>φ</sup>- , h, F · [φ- → V] when f ∈ V, f : σ → σ- , A : σ, c : σ- , γ(f) = V and φ-= ν(A) ' {c}

Given N ⊆ Names, [N → V] stands for the map [n →V| n ∈ N].

### **Fig. 6.** GOSC[HOSC] LTS

to the f ∈ V, c ∈ V side conditions). We write **Tr**GOSC(**C**) for the set of traces generated from **C** in the GOSC[HOSC] LTS.

Recall that, given a Γ-assignment ρ, term Γ M : τ and c ∈ CNames<sup>τ</sup> , the active configuration Cρ,c <sup>M</sup> was defined by <sup>C</sup>ρ,c <sup>M</sup> = M{ρ}, c, ∅, ∅, ν(ρ) ∪ {c}, ∅. We need to upgrade it to the LTS by initializing the new component to the empty map: Cρ,c M,vis = M{ρ}, c, ∅, ∅, ν(ρ) ∪ {c}, ∅, ∅.

**Definition 16.** The GOSC[HOSC] **trace semantics** of a cr-free HOSC term Γ M : τ is defined by **Tr**GOSC(Γ M : τ ) = {((ρ, c), t)| ρ is a Γ-assignment, <sup>c</sup> : τ, t <sup>∈</sup> **Tr**GOSC(Cρ,c M,vis )}.

By construction, it follows that

**Lemma 8.** <sup>t</sup> <sup>∈</sup> **Tr**GOSC(Cρ,c M,vis ) iff <sup>t</sup> <sup>∈</sup> **Tr**HOSC(Cρ,c <sup>M</sup> ) and t is O-visible.

Noting that the witness trace t from Lemma 4 is O-visible iff t <sup>⊥</sup> ¯/((), c ) is Pvisible, we can conclude that, for GOSC, the traces relevant to ⇓err are O-visible, which yields:

**Theorem 3 (Soundness).** For any cr-free HOSC terms Γ M1, M2, if **Tr**GOSC(<sup>Γ</sup> <sup>M</sup>1) <sup>⊆</sup> **Tr**GOSC(<sup>Γ</sup> <sup>M</sup>2) then <sup>Γ</sup> <sup>M</sup><sup>1</sup> GOSC(ciu) err <sup>M</sup>2.

To prove the converse, we need a new definability result. This time we are only allowed to use GOSC syntax, but the target is also more modest: we are only aiming to capture P-visible traces.

**Lemma 9 (Definability).** Suppose φ 4 {/} ⊆ FNames and t is an even-length P-visible (◦4 {/}, φ4 {c})-trace starting with an O-action. There exists a passive configuration **C** such that the even-length traces in **Tr**HOSC(**C**) are exactly the even-length prefixes of t (along with all renamings that preserve types and φ 4 {c} 4 ◦ 4 {/}). Moreover, **C** = γ◦ · [c → K◦], {c → ◦<sup>τ</sup>-}, φ 4 {c} 4 ◦ 4 {/}, h◦, where h, K, γ are built from GOSC syntax.

Proof (Sketch). This time we cannot rely on references to recall on demand all continuation and function names introduced by the environment. However, because t is P-visible, it turns the uses of the names can be captured through variable bindings (λx. ··· for function and call/cc<sup>τ</sup> (x. . . .) for continuation names). Using throw, we can then force an arbitrary answer action, as long as it uses a P-available name. To select the right action at each step, we branch on the value of a single global reference of type ref Int that keeps track of the number of steps simulated so far.

Completeness now follows because, for a potential O-visible witness t from Lemma 4, one can create a corresponding context by invoking the Definability result for t <sup>⊥</sup> ¯/((), c ). It is crucial that the addition of ¯/((), c ) does not break P-visibility (/ is P-visible).

**Theorem 4 (Completeness).** For any cr-free HOSC terms Γ M1, M2, if <sup>Γ</sup> <sup>M</sup><sup>1</sup> GOSC(ciu) err <sup>M</sup><sup>2</sup> then **Tr**GOSC(<sup>Γ</sup> <sup>M</sup>1) <sup>⊆</sup> **Tr**GOSC(<sup>Γ</sup> <sup>M</sup>2).

Altogether, Theorems 3, 4 (along with Lemma 1) imply the following result.

**Corollary 2 (**GOSC **Full Abstraction).** Suppose Γ M1, M<sup>2</sup> are cr-free HOSC terms. **Tr**GOSC(<sup>Γ</sup> <sup>M</sup>1) <sup>⊆</sup> **Tr**GOSC(<sup>Γ</sup> <sup>M</sup>2) iff <sup>Γ</sup> <sup>M</sup><sup>1</sup> GOSC(ciu) err <sup>M</sup><sup>2</sup> iff <sup>Γ</sup> <sup>M</sup><sup>1</sup> GOSC err M2.

Example 9. In the Callback with lock example (Example 6), we exhibited traces t1, t<sup>2</sup> that separated Mcwl <sup>1</sup> and Mcwl <sup>2</sup> with respect to HOSC err . Example 8 shows that neither trace is O-visible, i.e. they do not belong to **Tr**GOSC(Γ M1) or **Tr**GOSC(<sup>Γ</sup> <sup>M</sup>2). Thus, the two traces cannot be used to separate <sup>M</sup>cwl <sup>1</sup> , Mcwl 2 with respect to GOSC err . As already mentioned, this is in fact impossible: we have <sup>M</sup>cwl <sup>1</sup> ∼=GOSC err Mcwl <sup>2</sup> .

Example 10 (Well-bracketed state change [4]). Consider the following two terms

$$\begin{array}{l} M\_1^{wbsc} \triangleq \text{let} \, x = \text{ref} \, 0 \, \text{in} \, \lambda f. (x := 0; f(); x := 1; f(); !x) \\ M\_2^{wbsc} \triangleq \lambda f. (f(); f(); 1). \end{array}$$

of type τ = (Unit → Unit) → Int, let

$$\mathbf{t}\_3 = \bar{c}(g) \begin{array}{cccc} g(f\_1, c\_1) & \bar{f}\_1((), c\_2) & c\_2(()) & \bar{f}\_1((), c\_3) & g(f\_2, c\_4) & \bar{f}\_2((), c\_5) & c\_3(()) & \bar{c}\_1(0) \end{array}$$

and let t<sup>4</sup> be obtained from t<sup>3</sup> by changing 0 in the last action to 1. One can check that both traces are O-visible: in particular, the action c3(()) is not a violation because

$$\mathsf{Ver}\_O(\bar{c}(g)\,\,g(f\_1,c\_1)\,\,\bar{f}\_1((),c\_2)\,\,c\_2(())\,\,\bar{f}\_1((),c\_3)\,\,g(f\_2,c\_4)\,\,\bar{f}\_2((),c\_5)) = \{g,c\_3,c\_5\}.$$

Moreover, <sup>t</sup><sup>3</sup> <sup>∈</sup> **Tr**GOSC(C<sup>∅</sup>,c Mwbsc 1 ) \ **Tr**GOSC(C<sup>∅</sup>,c Mwbsc 2 ) and <sup>t</sup><sup>4</sup> <sup>∈</sup> **Tr**GOSC(C<sup>∅</sup>,c Mwbsc 2 ) \ **Tr**GOSC(C<sup>∅</sup>,c Mwbsc 1 ). By the Corollary above, we can conclude that Mwbsc <sup>1</sup> , Mwbsc 2 are incomparable wrt GOSC err . However, they turn out to be ∼=HOS err - and ∼=GOS err equivalent.

# **5 HOS[HOSC]**

Recall that HOS is the fragment of HOSC that does not feature continuation types and the associated syntax. In what follows we are going to provide alternative characterisations of HOS err and HOS ter in terms of trace inclusion and complete trace inclusion respectively.

We start off by identifying several technical consequences of the restriction to HOS syntax. First we observe that HOS internal reductions never change the associated continuation name.

**Lemma 10.** If (M, c, h) → (M , c , h ), M is a HOS term and h is a HOS heap then c = c .

Proof. The only rule that could change c is the rule for throw, but it is not part of HOS.

The lemma has a bearing on the shape of traces generated by the (passive) configurations Cγi,c h,K,γ corresponding to HOS contexts. In the presence of throw and storage for continuations, it was possible for P to play answers involving arbitrary continuation names introduced by O. By Lemma 10, in HOS this will be restricted to the continuation name of the current configuration, which will restrict the shape of possible traces. Below we identify the continuation name top<sup>P</sup> (t) that becomes the relevant name after trace t. If the last move was an O-question then the continuation name introduced by that move will become that name. Otherwise, we track a chain of answers and questions, similarly to the definition of P-visibility.

Observe that, because h, K, γ are from HOS, Cγi,c h,K,γ will generate ({◦<sup>τ</sup>- , /}, φ4 {c})-traces, where τ is the result type of K, because h◦ = h, K◦ = K, γ◦ = γ.

**Definition 17.** Suppose φ 4 {/} ⊆ FNames and c ∈ CNames. Let t be a ({◦<sup>τ</sup>- , /}, φ 4 {c})-trace of odd length starting with an O-action. The continuation name top<sup>P</sup> (t) is defined as follows.

$$\begin{array}{c} \operatorname{top}\_P(\operatorname{t}\ c(A)) = \circ\_{\tau'}\\ \operatorname{top}\_P(\operatorname{t}\_1\ \overline{f}(A'', c')\ \operatorname{t}\_2\ c'(A')) = \operatorname{top}\_P(\operatorname{t}\_1)\\ \operatorname{top}\_P(\operatorname{t}\ f(A', c')) = c' \end{array}$$

We say that a ({◦<sup>τ</sup>- ∪ {/}, φ 4 {c})-trace t starting with an O-action is **Pbracketed** if, for any prefix t c ¯ (A) of t (i.e. any prefix ending with a P-answer), we have c = top<sup>P</sup> (t ).

**Lemma 11.** Consider **C** = Cγi,c h,K,γ, where h, K, γ are from HOS and (Ai, γi) <sup>∈</sup> **AVal**<sup>Γ</sup> (γ). Then all traces in **Tr**HOSC(**C**) are P-bracketed.

The Lemma above characterizes the restrictive nature of contextual interactions with HOS contexts. Next we shall constrain the HOSC[HOSC] LTS accordingly to capture the restriction. Note that, from the point of view of the term, the above-mentioned constraint concerns the use of continuation names by O (the context), so we need to talk about O-bracketing instead. This dual notion of "a top name for O" is specified below.

$$\begin{array}{lcl}(P) & \langle (M,c,\gamma,\xi,\phi,h) \qquad \xrightarrow{\tau} & \langle N,c',\gamma,\xi,\phi,h'\rangle\\ & \text{when } (M,c,h) \to (N,c',h')\\ & \langle (V,c,\gamma,\xi,\phi,h) \qquad \xrightarrow{\varepsilon(A)} & \langle \gamma\cdot\gamma',\xi,\phi\uplus\nu(A),h,c'\rangle\\ & \text{when } c:\sigma,\langle A,\gamma'\rangle \in \mathbf{AVal}\_{\sigma}(V),\xi(c)=c'\\ & \langle (K[V],c,\gamma,\xi,\phi,\phi,h) \xrightarrow{f(A,c')} & \langle \gamma\cdot\gamma' \cdot[c'\mapsto K],\xi\cdot[c'\mapsto c],\phi\uplus\nu(A)\oplus\{c'\},h,c'\rangle\\ & \text{when } f:\sigma\to\sigma',\langle A,\gamma'\rangle \in \mathbf{AVal}\_{\sigma}(V),c':\sigma'\\ & \langle (OA)\rangle\langle \gamma,\xi,\phi,h,c'\rangle & \xrightarrow{c(A)} & \langle K[A],c',\gamma,\xi,\phi\uplus\nu(A),h\rangle\\ & \text{when } c=c',\ c:\sigma,A:\sigma,\gamma(c)=K,\ \xi(c)=c' \end{array}$$

$$(OQ)\big|\{\gamma,\xi,\phi,h,c^{\prime\prime}\}\quad\xrightarrow{f(A,c)}\quad\langle VA,c,\gamma,\xi\cdot[c\mapsto c^{\prime\prime}],\phi\not\mapsto\nu(A)\#\{c\},h\rangle\big|$$
  $\text{when }f:\sigma\to\sigma',\ A:\sigma,c:\sigma',\ \gamma(f)=V$ 

#### **Fig. 7.** HOS[HOSC] LTS

**Definition 18.** Suppose φ ⊆ FNames and c ∈ CNames. Let t be a (φ 4 {c}, ∅) trace of odd length. The continuation name topO(t) is defined as follows. In the first case, the value is ⊥ (representing "none"), because c is the top continuation passed by the environment to the term (if it gets answered there is nothing left to answer).

$$\begin{array}{c} \operatorname{top}\_O(t \; \bar{c}(A)) = \bot\\ \operatorname{top}\_O(t\_1 \; f(A^{\prime \prime}, c^{\prime}) \; t\_2 \; \bar{c}^{\prime}(A^{\prime})) = \operatorname{top}\_O(t\_1) \\ \operatorname{top}\_O(t \; \bar{f}(A^{\prime}, c^{\prime})) = c^{\prime} \end{array}$$

We say that a (φ 4 {c}, ∅)-trace t is **O-bracketed** if, for any prefix t c (A) of t (i.e. any prefix ending with an O-answer), we have c = topO(t ).

In Figure 7, we present a new LTS, called the HOS[HOSC] LTS, which will turn out to capture HOS err . It is obtained from the HOSC[HOSC] LTS by restricting O-actions to those that satisfy O-bracketing. Technically, this is done by enriching passive configurations with a component for storing the current value of topO(t). In order to maintain this information, we need to know which continuation will become the top one if P plays an answer. This can be done with a map that maps continuations introduced by O to other continuations. Because its flavour is similar to ξ (which is a map from continuations introduced by P) we integrate this information into ξ. The c = c side condition then enforces O-bracketing. We shall write **Tr**HOS(**C**) for the set of traces generated from **C** in the HOS[HOSC] LTS.

Recall that, given a Γ-assignment ρ, term Γ M : τ and c : τ , the active configuration Cρ,c <sup>M</sup> was defined by <sup>C</sup>ρ,c <sup>M</sup> = M{ρ}, c, ∅, ∅, ν(ρ)∪{c}, ∅. We upgrade it to the new LTS by setting Cρ,c M,bra = M{ρ}, c, ∅, [c → ⊥], ν(ρ)∪ {c}, ∅, ∅. This initializes ξ in such a way that, after ¯c(A) is played, the extra component will be set to ⊥, where ⊥ is a special element not in CNames.

**Definition 19.** The HOS[HOSC] **trace semantics** of a cr-free HOSC term Γ M : τ is defined to be **Tr**HOS(Γ M : τ ) = {((ρ, c), t)| ρ is a Γ-assignment, <sup>c</sup> : τ, t <sup>∈</sup> **Tr**HOS(Cρ,c M,bra )}.

By construction, it follows that

**Lemma 12.** <sup>t</sup> <sup>∈</sup> **Tr**HOS(Cρ,c M,bra ) iff <sup>t</sup> <sup>∈</sup> **Tr**HOSC(Cρ,c <sup>M</sup> ) and t is O-bracketed.

Noting that the witness trace t from Lemma 4 is O-bracketed iff t <sup>⊥</sup> ¯/((), c ) is P-bracketed, we can conclude that, for HOS, the traces relevant to ⇓err are O-bracketed, which yields:

**Theorem 5 (Soundness).** For any cr-free HOSC terms Γ M1, M2, if **Tr**HOS(<sup>Γ</sup> <sup>M</sup>1) <sup>⊆</sup> **Tr**HOS(<sup>Γ</sup> <sup>M</sup>2) then <sup>Γ</sup> <sup>M</sup><sup>1</sup> HOS(ciu) err <sup>M</sup>2.

For the converse, we establish another definability result, this time for a Pbracketed trace.

**Lemma 13 (Definability).** Suppose φ4{/} ⊆ FNames and t is an even-length P-bracketed ({◦<sup>τ</sup>- , /}, φ 4 {c})-trace starting with an O-action. There exists a passive configuration **C** such that the even-length traces **Tr**HOSC(**C**) are exactly the even-length prefixes of t (along with all renamings that preserve types and φ 4 {c, ◦<sup>τ</sup>- , /}). Moreover, **C** = γ · [c → K], {c → ◦<sup>τ</sup>-}, φ 4 {c, ◦<sup>τ</sup>- , /}, h, where h, K, γ are built from HOS syntax.

Proof (Sketch). Our argument for HOSC is structured in such a way that, for a P-bracketed trace, there is no need for continuations (throwing and continuation capture are not necessary).

Completeness now follows because, for a potential witness trace t from Lemma 4, one can create a corresponding context by invoking the Definability result for t <sup>⊥</sup> ¯/((), c ). It is crucial that the addition of ¯/((), c ) does not break P-bracketing (it does not, because the action is a question).

**Theorem 6 (Completeness).** For any cr-free HOSC terms Γ M1, M2, if <sup>Γ</sup> <sup>M</sup><sup>1</sup> HOS(ciu) err <sup>M</sup><sup>2</sup> then **Tr**HOS(<sup>Γ</sup> <sup>M</sup>1) <sup>⊆</sup> **Tr**HOS(<sup>Γ</sup> <sup>M</sup>2).

Altogether, Theorems 5, 6 (along with Lemma 1) imply the following result.

**Corollary 3 (**HOS **Full Abstraction).** Suppose Γ M1, M<sup>2</sup> are cr-free HOSC terms. Then **Tr**HOS(<sup>Γ</sup> <sup>M</sup>1) <sup>⊆</sup> **Tr**HOS(<sup>Γ</sup> <sup>M</sup>2) iff <sup>Γ</sup> <sup>M</sup><sup>1</sup> HOS(ciu) err <sup>M</sup><sup>2</sup> iff <sup>Γ</sup> <sup>M</sup><sup>1</sup> HOS err M2.

Example 11 (Assignment/callback commutation [27]). For i ∈ {1, 2}, let f : Unit → Unit M<sup>i</sup> : Unit → Unit be defined by:

$$\begin{array}{l} M\_1 \triangleq \text{let} \, n = \text{ref (0)} \, \text{in} \, \lambda y^{\text{Unit}}. \text{if (} !n > 0 \text{) (} \text{) (} n := 1; f() \text{)},\\ M\_2 \triangleq \text{let} \, n = \text{ref (} 0 \text{)} \, \text{in} \, \lambda y^{\text{Unit}}. \text{if (} !n > 0 \text{) (} \text{) (} f() \text{)} ; n := 1 ). \end{array}$$

Operationally, one can see that <sup>f</sup> <sup>M</sup><sup>1</sup> HOS err M<sup>2</sup> due to the following HOS context: let r = ref (λy.y) in (let f = λy.(!r)() in (r := •; (!r)())); err . In our framework, this is confirmed by the trace

$$\mathfrak{t}\_5 \quad = \quad \bar{c}(g) \quad g((), c\_1) \quad \bar{f}((), c\_2) \quad g((), c\_2) \quad \bar{c}\_2(()),$$

which is in **Tr**HOS(Cρ,c <sup>M</sup><sup>1</sup> ) \ **Tr**HOS(Cρ,c <sup>M</sup><sup>2</sup> ). On the other hand,

$$\mathbf{t}\_6 \quad = \quad \bar{c}(g) \quad g((), c\_1) \quad \bar{f}((), c\_2) \quad g((), c\_2) \quad \bar{f}((), c\_3)$$

is in **Tr**HOS(Cρ,c <sup>M</sup><sup>2</sup> ) \ **Tr**HOS(Cρ,c <sup>M</sup><sup>1</sup> ), so the terms are incomparable. Note, however, that both traces break O-visibility: specifically, we have

$$\mathsf{Vis}\_O(\bar{c}(g)\,\,g((),c\_1)\,\,\bar{f}((),c\_2)) = \{c\_2\},$$

so the g((), c2) action violates the condition. Consequently, the traces do not preclude <sup>f</sup> <sup>M</sup><sup>1</sup> <sup>∼</sup>=**<sup>x</sup>** err M<sup>2</sup> for **x** ∈ {GOSC, GOS}.

For **<sup>x</sup>** ∈ {HOSC, GOSC}, **<sup>x</sup>** err and **<sup>x</sup>** ter coincide. Intuitively, this is because the presence of continuations in the context makes it possible to make an escape at any point. In contrast, for HOS, the context must run to completion in order to terminate.

At the technical level, one can appreciate the difference when trying to transfer our results for HOS(ciu) err to HOS(ciu) ter . Recall that, according to Lemma 4, ⇓ter relies on a witness trace t such that the context configuration generates t <sup>⊥</sup> ◦¯ τ- (). In HOS, the latter must satisfy P-bracketing, so we need top<sup>P</sup> (t <sup>⊥</sup>) = ◦<sup>τ</sup>- . Note that this is equivalent to topO(t) = ⊥. Consequently, only such traces are relevant to observing ⇓ter .

We shall call an odd-length O-bracketed (φ 4 {c}, ∅)-trace t *complete* if topO(t) = ⊥. Let us write **Tr**HOS(Γ M1) ⊆<sup>c</sup> **Tr**HOS(Γ M2) if we have ((ρ, c), t) ∈ **Tr**HOS(Γ M2) whenever ((ρ, c), t) ∈ **Tr**HOS(Γ M1) and t is complete. Following our methodology, one can then show:

**Theorem 7 (**HOS **Full Abstraction for** HOS ter **).** Suppose Γ M1, M<sup>2</sup> are crfree HOSC terms. **Tr**HOS(<sup>Γ</sup> <sup>M</sup>1) <sup>⊆</sup><sup>c</sup> **Tr**HOS(<sup>Γ</sup> <sup>M</sup>2) iff <sup>Γ</sup> <sup>M</sup><sup>1</sup> HOS(ciu) ter M<sup>2</sup> iff <sup>Γ</sup> <sup>M</sup><sup>1</sup> HOS ter M2.

Example 12. Let <sup>M</sup><sup>1</sup> <sup>≡</sup> λf Unit→Unit.f(); <sup>Ω</sup>Unit and <sup>M</sup><sup>2</sup> <sup>≡</sup> λf Unit→Unit.ΩUnit. We will see that <sup>M</sup><sup>1</sup> HOS err <sup>M</sup><sup>2</sup> but <sup>M</sup><sup>1</sup> HOS ter M2. To see this, note that **Tr**HOS(Cρ,c <sup>M</sup><sup>1</sup> ) contains prefixes of ¯c(g) <sup>g</sup>(f, c1) ¯f((), c2) <sup>c</sup>2(()), while **Tr**HOS(Cρ,c <sup>M</sup><sup>2</sup> ) only those of ¯c(g) g(f, c1). Observe that the only complete trace among them is ¯c(g). The trace <sup>t</sup> = ¯c(g) <sup>g</sup>(f, c1) ¯f((), c2) is not complete, because topO(t) = c2. Consequently, **Tr**HOS(Γ M1) ⊆ **Tr**HOS(Γ M2) but **Tr**HOS(Γ M1) ⊆<sup>c</sup> **Tr**HOS(Γ M2).

The theorem above generalizes the characterisation of contextual equivalence between HOS terms with respect to HOS contexts [23], where trace completeness means both O- and P-bracketing and "all questions must be answered". Our definition of completeness is weaker (O-bracketing + "the top question must be answered"), because it also covers HOSC terms. However, in the presence of both O- and P-bracketing, i.e. for HOS terms, they will coincide.

# **6 GOS[HOSC]**

Recall that GOS features ground state only and, technically, is the intersection of GOSC and HOS. Consequently, it follows from the previous sections that GOS contexts yield configurations that satisfy both P-visibility and P-bracketing. For such traces, the definability result for GOSC yields a GOS context. Thus, in a similar fashion to the previous sections, we can conclude that O-visible and O-bracketed traces underpin GOS err . To define the GOS LTS we simply combine the restrictions imposed in the previous sections, and define **Tr**GOS(Γ M) analogously. The results on GOS ter from the previous section also carry over to GOS.

**Theorem 8 (**GOS **Full Abstraction).** Suppose Γ M1, M<sup>2</sup> are cr-free HOSC terms. Then:


# **7 Concluding remarks**

Asymmetry Our framework is able to deal with asymmetric scenarios, where programs are taken from HOSC, but are tested with contexts from weaker fragments. For example, we can compare the following two HOSC programs, where f : ((Unit → Unit) → Unit) → Unit is a free identifier.


with div representing divergence. The terms happen to be ∼=HOS err -equivalent, but not ∼=HOSC err -equivalent.

To see this at the intuitive level, we make the following observations.


$$\bar{f}(h, c\_1) \qquad h(g, c\_2) \qquad \bar{g}((), c\_3) \qquad c\_3(()) \qquad \bar{c}(())$$

This trace is O-bracketed, but not P-bracketed since Player uses throw to answer directly to the initial continuation c rather than c2.

**–** Finally, if HOSC contexts are allowed, it is possible to reach the subterm 'if !b then () else div' with b set to **tt**. This is represented by the trace

$$\begin{array}{cccc} \bar{f}(h, c\_1) & & h(g, c\_2) & & \bar{g}((), c\_3) & & c\_1(()) & & \bar{c}(()) \end{array}$$

This trace is not O-bracketed, because c<sup>1</sup> is answered rather than c3, like above. Consequently, the trace witnesses termination of the first term, but the second term would diverge during interaction with the same context.

We plan to explore the opportunities presented by this setting in the future, especially with respect to fully abstract translations, for example, from HOSC to GOS.

Richer Types Recall that our full abstraction results are stated for cr-free terms, terms with cont- and ref-free types at the boundary. Here we first discuss how to extend them to more complicated types.

To deal with reference type at the boundary, i.e. location exchange, one needs to generalize the notion of traces, so that they can carry, for each action, a heap representing the values stored in the disclosed part of the heap, as in [23,27]. The extension to sum, recursive and empty types seems conceptually straightforward, by simply extending the definition of abstract values for these types, following the similar notion of ultimate pattern in [24]. The same idea should apply to allow continuation types at the boundary. Operational game semantics for an extension of HOS with polymorphism has been explored in [15].

Innocence On the other hand, all of the languages we considered were stateful. In the presence of state, all of the actions that are represented by labels (and their order and frequency) can be observed, because they could generate a sideeffect. A natural question to ask whether the techniques could also be used to provide analogous theorems for purely functional computation, i.e. contexts taken from the language PCF. Here, the situation is different. For example, the terms f : Int → Int f(0) and f : Int → Int if f(0) f(0) f(0) should be equivalent, even though the sets of their traces are incomparable.

It is known [12] that PCF strategies satisfy a uniformity condition called innocence. Unfortunately, restricting our traces to "O-innocent ones" (like we did with O-visibility and O-bracketing) would not deliver the required characterization. Technically, this is due to the fact that, in our arguments, given a single trace (with suitable properties), we can produce a context that induces the given trace and no other traces (except those implied by the definition of a trace). For innocence, this would not be possible due to the uniformity requirement. It will imply that, although we can find a functional context that generates an innocent trace, it might also generate other traces, which then have to be taken into account when considering contextual testing. This branching property makes it difficult to capture equivalence with respect to functional contexts explicitly, e.g. through traces, which is illustrated by the use of the so-called intrinsic quotient in game models of PCF [2,12].

# **8 Related Work**

We have presented four operational game models for HOSC, which capture term interaction with contexts built from any of the four sublanguages **x** ∈ {HOSC, GOSC, HOS, GOS} respectively. The most direct precursor to this work is Laird's trace model for HOS[HOS] [23]. Other frameworks in this spirit include models for objects [18], aspects [16] and system-level code [9]. In [13], Laird's model has been related formally to the denotational game model from [27]. However, in general, it is not yet clear how one can move systematically between the operational and denotational game-based approaches, despite some promising steps reported in [25]. Below we mention other operational techniques for reasoning about contextual equivalence.

In [31], fully abstract Eager-Normal-Form (enf) Bisimulations are presented for an untyped λ-calculus with store and control, similar to HOSC (but with control represented using the λμ-calculus). The bisimulations are parameterised by worlds to model the evolution of store, and bisimulations on contexts are used to deal with control. Like our approach, they are based on symbolic evaluation of open terms. Typed enf-bisimulations, for a language without store and in controlpassing style, have been introduced in [24]. Fully-abstract enf-bisimulations are presented in [7] for a language with state only, corresponding to an untyped version of HOS. Earlier works in this strand include [17,29].

Environmental Bisimulations [19,30,32] have also been introduced for languages with store. They work on closed terms, computing the arguments that contexts can provide to terms using an environment similar to our component γ. They have also been extended to languages with call/cc [34] and delimited control operators [5,6].

Kripke Logical Relations [28,4,8] have been introduced for languages with state and control. In [8], a characterization of contextual equivalence for each case **x**[**x**] (**x** ∈ {HOSC, GOSC, HOS, GOS}) is given, using techniques called backtracking and public transitions, which exploit the absence of higher-order store and that of control constructs respectively. Importing these techniques in the setting of Kripke Open Bisimulations [14] should allow one to build a bridge between the game-semantics characterizations and Kripke Logical Relations.

Parametric bisimulations [11] have been introduced as an operational technique, merging ideas from Kripke Logical Relations and Environmental Bisimulations. They do not represent functional values coming from the environment using names, but instead use a notion of global and local knowledge to compute these values, reminiscent of the work on environmental bisimulations. The notion of global knowledge depends itself on a notion of evolving world. To our knowledge, no fully abstract Parametric Bisimulations have been presented.

A general theory of applicative [21] and normal-form bisimulations [20] has been developed, with the goal of being modular with respect to the effects considered. While the goal is similar to our work, the papers consider monadic and algebraic presentation of effects, trying particularly to design a general theory for proving soundness and completeness of such bisimulations. These works complement ours, and we would like to explore possible connections.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Session Coalgebras: A Coalgebraic View on Session Types and Communication Protocols**

Alex C. Keizer<sup>1</sup> , Henning Basold<sup>2</sup> -, and Jorge A. P´erez3,<sup>4</sup>

 Master of Logic, ILLC, University of Amsterdam, Amsterdam, The Netherlands LIACS – Leiden University, Leiden, The Netherlands h.basold@liacs.leidenuniv.nl University of Groningen, Groningen, The Netherlands CWI, Amsterdam, The Netherlands j.a.perez@rug.nl

**Abstract** Compositional methods are central to the development and verification of software systems. They allow breaking down large systems into smaller components, while enabling reasoning about the behaviour of the composed system. For concurrent and communicating systems, compositional techniques based on behavioural type systems have received much attention. By abstracting communication protocols as types, these type systems can statically check that programs interact with channels according to a certain protocol, whether the intended messages are exchanged in a certain order. In this paper, we put on our coalgebraic spectacles to investigate session types, a widely studied class of behavioural type systems. We provide a syntax-free description of session-based concurrency as states of coalgebras. As a result, we rediscover type equivalence, duality, and subtyping relations in terms of canonical coinductive presentations. In turn, this coinductive presentation makes it possible to elegantly derive a decidable type system with subtyping for π-calculus processes, in which the states of a coalgebra will serve as channel protocols. Going full circle, we exhibit a coalgebra structure on an existing session type system, and show that the relations and type system resulting from our coalgebraic perspective agree with the existing ones.

**Keywords:** Session types · Coalgebra · Process calculi · Coinduction.

# **1 Introduction**

Communication protocols enable interactions between humans and computers alike, yet different scientific communities rely on different descriptions of protocols: one community may use textual descriptions, another uses diagrams, and yet another may use types. There is then a mismatch, which is fruitful and hindering at the same time. Fruitful, because different views on protocols lead to different insights and technologies. But hindering, because exactly those insights and technologies cannot be easily exchanged. With this paper, we wish to provide a view of protocols that opens up new links between communities and that, at the same time, contributes new insights into the nature of communication protocols.

What would such a view of communication protocols be? Software systems typically consist of concurrent, interacting processes that pass messages over channels. Protocols are then a description of the possible exchanges on channels, without ever referring to the exact structure of the processes that use the channels. Since we may, for example, expect to get an answer only after sending a question, it is clear that such exchanges have to happen in an appropriate order. Therefore, protocols have to be a state-based abstraction of communication behaviour on channels. Because coalgebras provide an abstraction of general state-based behaviour, our proposed view of communication protocols becomes: model the states of a protocol as states of a coalgebra and let the coalgebra govern the exchanges that may happen at each state of the protocol.

The above view of protocols allows us to model protocols as coalgebras. However, protocols are usually not studied for the sake of their description but to achieve certain goals: ensuring correct composition of processes, comparing communication behaviour, or refining and abstracting protocols. Session types [19,20] are an approach to communication correctness for processes that pass messages along channels. The idea is simple: describe a protocol as a syntactic object (a type), and use a type system to statically verify that processes adhere to the protocol. This syntactic approach allows the automatic and efficient verification of many correctness properties. However, the syntactic approach depends on choosing one particular representation of protocols and one particular representation of processes. We show in this paper that our coalgebraic view of protocols can guarantee correct process composition, and allows us to reason about key notions in the world of session types, type equivalence, duality and subtyping, while being completely independent of protocol and process representations.

Our coalgebraic view is best understood by following the distillation process of ideas on a concrete session type system by Vasconcelos [37]. Consider the session type S = ?int. !bool. end, which specifies the protocol on one endpoint of a channel that receives an integer, then outputs a Boolean, and finally terminates the interaction. Note that the protocol S specifies three different states: an input state, an output state, and a final state. Moreover, we note that S specifies only how the channel is seen from one endpoint; the other endpoint needs to use the channel with the dual protocol !int. ?bool. end. Thus, session type systems ensure that the states of S are enabled only in the specified order and that the two channel endpoints implement dual protocols.

A state-based reading of session types is intuitive and is already present in programming concepts such as typestates [15,32,33], theories of behavioural contracts [4,6,7,13], and connections between session types and communicating automata [10,25]. The novelty and insight of the coalgebraic view is that 1. it describes the state-based behaviour of protocols underlying session types, supporting unrestricted types and delegation, without adhering to any specific syntax or target programming model; 2. it offers a general framework in which key notions such as type equivalence, duality, and subtyping arise as instances of well-known coinductive constructions; and 3. it allows us to derive type systems for specific process languages, like the π-calculus.

Session Coalgebras at Work How does this coalgebraic view of protocols work for general session types? Consider a "mathematical server" that offers three operations to clients: integer multiplication, Boolean negation and quitting. The following session type T specifies a protocol to communicate with this server.

$$T = \mu X. \; \& \begin{cases} mul: & \text{?int. } \text{?int. } \text{!int. } X \\ neg: & \text{?} \text{\*\*o} \text{1. } \text{!} \text{\*\*o} \text{1. } X \\ quit: & \text{end} \end{cases}$$

T is a recursive protocol, as indicated by "μX. ", which can be repeated. A client can choose, as indicated by &, between the three operations (mul, neg, and quit) and the protocol then continues with the corresponding actions. For instance, after choosing mul, the server requests two integers and, once received, promises to send an integer over the channel. We can see states of the protocol T emerging, and it remains to provide a coalgebraic view on the actions of the protocol to obtain what we will call session coalgebras.

**Figure 1.** Protocol of the mathematical server as a session coalgebra

Fig. 1 depicts a session coalgebra that describes protocol T. It consists of states q0,...,q6, each representing a different state of T, and transitions between these states to model the evolution of T. Meaning is given to the different states and transitions through the labels on the states and transitions. The state labels, written in purple at top-left of the state name, indicate the branching type of that state. Depending on the branching type, the labels of the transitions bear different meanings. For instance, q<sup>0</sup> is labelled with "&", which indicates that this state initiates an external choice. The labels on the three outgoing transitions for q<sup>0</sup> (mul, neg, quit) correspond then to the possible kinds of message for selecting one of the branches. Continuing, states q1,...,q<sup>5</sup> are labelled with a request for data (label ?) or the sending of data (label !), and the outgoing transition labels indicate the type of the exchanged values (e.g., bool). Finally, state q<sup>6</sup> decrees the end of the protocol. Note that the cyclic character of T occurs as transitions back to q0; there is no need for an explicit operator to capture recursion.

**Figure 2.** Session coalgebra for the client view protocol the of mathematical server

A session coalgebra models the view on one channel endpoint, but to correctly execute a protocol we also need to consider the dual session coalgebra that models the other endpoint's view. In our example, the dual of Fig. 1 is given by the diagram in Fig. 2, which concerns states s0,...,s6. More precisely, the states q<sup>i</sup> and s<sup>i</sup> are pairwise dual in the following sense. The external choice of q<sup>0</sup> becomes an internal choice for s0, expressed through the label ⊕, with exactly the same labels on the transitions leaving s0. This means that whenever the server's protocol is in state q<sup>0</sup> and the client's protocol in state s0, then the client can choose to send one of the three signals to the server, thereby forcing the server protocol to advance to the corresponding state. All other states turn from sending states into receiving states and vice versa. We will see that this duality relation between states of session coalgebras has a natural coinductive description that can be obtained with the same techniques as bisimilarity. The duality relation for T will give us then the full picture of the intended protocol.

Suppose a client who would only want to use multiplication once but could also handle real numbers as inputs. Such a client had to follow the protocol given by the session coalgebra in Fig. 3, with states r0,...,r5.

$$\stackrel{\ominus}{r\_0}r\_0 \xrightarrow{\text{mul}} \stackrel{\text{l}}{r\_1}r\_1 \xrightarrow{\text{int}} \stackrel{\text{l}}{r\_2}r\_2 \xrightarrow{\text{int}} r\_3 \xrightarrow{\text{?}} \stackrel{\text{?}}{r\_3}r\_4 \xrightarrow{\text{?}} \stackrel{\text{quit}}{r\_4}r\_5$$

**Figure 3.** Session coalgebra that uses only part of a mathematical server

In theories of session types, the protocol of Fig. 2 would be a subtype of this one (cf. [17,16]). Concretely, this new client can also follow the subtype protocol, and can thus communicate with a server following the protocol of Fig. 1. For session coalgebras, we recover the same notion of subtyping by using specific simulation relations that allow us to prove that the behaviour of r<sup>0</sup> can be simulated by s0. Together, simulations and duality provide the foundation of typical session type systems.

We have used thus far session types and coalgebras for protocols with simple control and with exchanges of simple data values. In contrast, rich session type systems can regulate session delegation, the dynamic allocation and exchange of channels by processes. Imagine a process that creates a channel, which should adhere to some protocol T. From an abstract perspective, the process holds both endpoints of the new channel, and has to send one of them to the process it wishes to communicate with. To ensure statically that the receiving process respects the protocol of this new channel, we need to announce this communication as a transmission of the session type T (via an existing channel) and use T to verify the receiving process. Session delegation adds expressiveness and flexibility, but may cause problems in the characterisation of a correct notion of duality [18]. Remarkably, our coalgebraic view of session types makes this characterisation completely natural.

As an example, consider the type T = μX. ?X. X, which models a channel endpoint that infinitely often receives channel ends of its own type T. To obtain the dual of T, we may na¨ıvely try to replace the receive with a send, which results in the type μX. !X. X. The problem is that the two channel endpoints would not agree on the type they are sending or receiving, as any dual type of T needs to send messages of type T. Thus, the correct dual of T would be the type U = μX. !T. X. Both T and U specify the transmission of non-basic types, either the recursion variable X or T, in contrast to the mathematical server that merely stipulated the transmission of basic data values (integers or Booleans).

In our session coalgebras for the mathematical server it sufficed to have simple data types and branching labels on transitions. However, to represent T and U we will need another mechanism to express session delegation. We observe that a transmission in session types consists of the transmitted data and the session type that the protocol must continue with afterwards. Thus, a transition out of a transmitting state in a session coalgebra encompasses both a data transition and a continuation transition. In diagrams of session coalgebras, we indicate the data transition by a coloured arrow and an arrow connecting the data to the continuation transition. Using the combined transitions, Fig. 4 redraws the multiplication part of the mathematical server in Fig. 1.

**Figure 4.** Protocol of mathematical server as session coalgebra

This way, the transition q<sup>1</sup> int q<sup>2</sup> has been replaced by both a data transition into a new state q and a continuation transition into q2. Moreover, q has been declared as a data state that expects an integer to be exchanged (label int).

Having added these transitions to our toolbox, we can present the two types T and U as session coalgebras. The diagram in Fig. 5 shows such a session coalgebra, in which we name the states suggestively T and U.

**Figure 5.** Session coalgebra for a recursive type T and its dual U

Using this presentation as session coalgebras, it is now straightforward to coinductively prove that the states T and U are dual: 1. the states have opposite actions; 2. their data transitions point to equal types; and 3. their continuations are dual by coinduction. Clearly, the last step needs some justification but it will turn out that we can appeal to a standard definition of coinduction in terms of greatest fixed points. This demonstrates that our coalgebraic view on session types makes the definition of duality truly natural and straightforward.

Up to here, we have discussed session types and coalgebras that are linear, i.e., they enforce that protocols complete exactly once. In many situations, one also needs unrestricted types, which enable sharing of channels between processes that access these channels concurrently. This is the case of a process that offers a service for other processes, for instance a web server. Session delegation allows us to create dynamically channels and check their protocols, but the shared channel for initiating a session [17] has to offer its protocol to an arbitrary number of clients. Unrestricted types enable us to specify these kind of service offers.

As an example, consider a process that provides a channel for communicating integers to anyone asking, like a town hall official handing out citizen numbers. The type U = μX. un !int. X represents the corresponding protocol, where "un" qualifies the type !int. X as unrestricted. This allows the process holding the end of a channel with type U to transmit an integer to any process that is connected to the shared channel, without any restriction on their number. It is now surprisingly simple to express U in our coalgebraic view: we introduce a new state label "un" (unrestricted), which expresses that states reachable from this state can be used arbitrarily as protocols across different processes connecting to a channel that follow the protocol given by those states. The following diagram shows a session coalgebra with a state that corresponds to the type U .

Contributions and Related Work. In this paper, we introduce the notion of session coalgebra, which justifies the state-based behaviour of session types from

a coalgebraic perspective. This perspective is novel, although specific state-based description of protocols have been considered before [4,6,7,9,10,13,15,25,32,33]. Using coalgebra as a unifying framework for session types has two advantages: 1. session coalgebras can be defined and studied independently from specific syntactic formulations, while keeping the operational behaviour of session types; and 2. we can uncover the innate coinductive nature of key notions in session types, such as duality, subtyping, and type equivalence through standard coalgebraic techniques. In particular, although communicating automata can also provide syntax-independent characterisations of session types [10,11], such characterisations do not support delegation, an expressive feature which is cleanly justified in our coalgebraic approach. Coinduction already has been exploited in the definition of type equivalence [35], subtyping [17,16] and, especially, duality for systems with recursive types [3,18,24]. Unlike ours, these previous definitions are language-dependent, as they are tailored to specific process languages and/or syntactic variants of the type discipline. Session coalgebras enable thus the generalisation of insights and technologies from specific languages to any protocol specification that fits under the umbrella of state-based sessions.

To enable the verification of processes against protocols described by session coalgebras, we also contribute a type system for π-calculus processes, in which channel types are given by states of an arbitrary session coalgebra. Our type system revisits the one by Vasconcelos [37] from our coalgebraic perspective, while extending it with subtyping. Moreover, we provide a type checking algorithm for that system, provided that the underlying session coalgebra fulfils two intuitive conditions. In doing so, we show how a specific type syntax can be equipped with a session coalgebra structure and how the two decidability conditions are reflected in the type system. This is in contrast to starting with a specific type syntax and then employing category theoretical ideas [36], where coinductive session types are encoded in a session type system with parametric polymorphism [5]. Instead, we show how a session type system can be derived in general from coalgebras.

Organisation Throughout the remaining paper we will turn the sketched ideas into a coalgebraic framework. We introduce in Sec. 2 a concrete session type syntax that we will use as illustration of our framework. In Sec. 3, we will define session coalgebras as coalgebras for an appropriate functor and show that the type system from Sec. 2 can be equipped with a coalgebraic structure. The promised coinductive view on type equivalence, duality, subtyping, etc. will be provided in Sec. 4. Moreover, we will show that these notions are decidable under certain conditions that hold for any reasonable session type syntax, including the one from Sec. 2. Up to that point, the session coalgebras only had intrinsic meaning and were not associated to any process representation. Section 5 sets forth a type system for π-calculus, in which channels are assigned states of a session coalgebra as types. The resulting type system features subtyping and algorithmic type checking, presented in Sec. 6. Some final thoughts are gathered in Sec. 7. An extended version, available online, collects additional material [22].

$$\begin{array}{ccl} p ::= \text{?T. } T & T ::= d \in D \\ \mid & \mid T. \; T & \mid \text{ end} \\ \mid & \& \{l\_{i} : T\_{i}\}\_{i \in I} & q \; p \\ \mid & \oplus \{l\_{i} : T\_{i}\}\_{i \in I} & \mid & X \in \mathsf{Var} \\ q ::= \mathsf{lin} \mid \mathsf{un} & & \\ \end{array}$$

**Figure 6.** Session types over sets of basic data types D and of variables Var

# **2 Session Types**

To motivate the development of session coalgebras, we recall in this section the concrete syntax of an existing session type system by Vasconcelos [37]. After building up our intuition, we introduce session coalgebras in Sec. 3 to show they can represent this concrete type system.

The types of the system that we will be using are generated by the grammar in Fig. 6, relative to a set of basic data types D and a countable set of type variables Var. This grammar has three syntactic categories: pretypes p, qualifiers q, and session types T. A pretype p is simply a communication action: send (!), receive (?), external choice (&), and internal choice (⊕) indexed by a finite sets I of labels, followed by one or multiple session types. The simplest session types are basic data types in D and the completed, or terminated, protocol represented by end. A pretype and qualifier also form a session type, written as q p. The "lin" qualifier enforces that the communication action p has to be carried out exactly once, while the "un" qualifier allows arbitrary use of p. Finally, we can form recursive session types with the the fixed point operator μ and the use of type variables. We use the usual notion of α-equivalence, (capture-avoiding) substitution, and free and bound types variables for session types.

The grammar allows arbitrary recursive types. We let Type be the set of all T in which recursive types are contractive and closed, which means that they contain no substrings of the form μX1.μX<sup>2</sup> . . . μXn.X<sup>i</sup> and no free type variables.

To lighten up notation, we will usually omit the qualifier lin and assume every type to finalise with end. With these conventions, we write, e.g., ?int. instead of lin ?int. end and un ?int. for a single unrestricted read.

We assume there is some decidable subtyping preorder ≤<sup>D</sup> over the basic types. A type is a subtype of another if the subtype can be used anywhere where the supertype was accepted. In examples, we use the basic types int, real and bool, and we assume that int is a subtype of real, as usual.

An important notion is the unfolding of a session type, which we define next:

**Definition 1 (Unfolding).** The unfolding of a recursive type μX.T is defined recursively

unfold(μX.T) = unfold(T[μX.T /X])

For all other T in Type, unfold is the identity: unfold(T) = T.

Because we assume that types are contractive, unfold(T) terminates for all T. Also, because all types are required to be closed, unfold(T) can never be a variable X. Any such variable would have to be bound somewhere before use, meaning it would have been substituted. Furthermore, unfolding a closed type always yields another closed type, as each removed binder always causes a substitution of the bound variable.

# **3 Session Coalgebra**

Here we will discuss session coalgebras, the main contribution of this paper. The idea is that session coalgebras will be coalgebras for a specific functor F, which will capture the state labels and the various kinds of transitions that we discussed in Sec. 1. An important feature of coalgebras in general, and session coalgebras in particular, is that the states can be given by an arbitrary set. We will leverage this to define a session coalgebra on the set of types Type introduced in Sec. 2.

Before coming to the definition, let us briefly recall some minimal notions of category theory. We will not require a lot of category theoretical terminology; in fact, we will only use the category **Set** of sets and functions. Moreover, we will be dealing with functors F : **Set** → **Set** on the category **Set**. Such a functor allows us to map a set X to a set F(X), and functions f : X → Y to functions F(f) : F(X) → F(Y ). To be meaningful, a functor must preserve identity and compositions. That is, F maps the identity function id<sup>X</sup> : X → X on X to the identity on F(X): F(idX) = id<sup>F</sup> (X); and, given functions f : X → Y and g : Y → Z, we must have F(g ◦ f) = F(g) ◦ F(f).

A central notion is that of the coalgebras for a functor F. A coalgebra is given by a pair (X, c) of a set X and a function c : X → F(X). For simplicity, we often leave out X and refer to c as the coalgebra. The general idea is that the set X is the set of states and that c assigns to every state its one-step behaviour. In the case of session coalgebras this will be the state labels and outgoing transitions. Given two coalgebras c : X → F(X) and d: Y → F(Y ), we say that h: X → Y is a homomorphism, if d ◦ h = F(h) ◦ c. Coalgebras and their homomorphisms form a category, with the same identity maps and composition as in **Set**.

We will have to analyse subsets of coalgebras that are closed under transitions. Given a coalgebra c : X → F(X), we say that d: Y → F(Y ) with Y ⊆ X is a subcoalgebra of c if the inclusion map Y → X is a coalgebra homomorphism. Note that in this case c(Y ) ⊆ F(Y ) and thus d is the restriction of c to Y . Hence, we also refer to Y as subcoalgebra. The subcoalgebra generated by x ∈ X in c, denoted by xc, is the least subset of X that contains x and is a subcoalgebra of c. Intuitively, it is the set of x and all states that are reachable from x.

Coming to the concrete case of session coalgebras, we now construct a functor that allows us to capture the state labels and the different kinds of transitions. Keeping in mind that states of a session coalgebra correspond to states of a protocol, we need to be able to label the states with enabled operations.

**Definition 2 (Operations and Polarities).** The operation of a state describes the action it represents: com marks the transmission (sending or receiving) of a value; branch marks an (internal or external) choice; end marks the completed protocol; bsc marks a basic data type; and un marks an unrestricted type. States that transmit data (labelled with com) or allow for choice (labelled with branch) also have a polarity, which can be either in (a receiving action or external choice) or out (a sending action or internal choice). We let O be the set of all operations O = {com, branch, end, bsc, un} and P the set of polarities P = {in, out}.

Note that pairs in {com, branch} × P directly correspond to the actions of a session type: ? = (com, in), ! = (com, out), & = (branch, in) and ⊕ = (branch, out). We will be using these markers to abbreviate the pairs.

Now that we have the possible operations of a protocol, we need to define the transitions that may follow each operation. Recall that the transition at a choice state has to be labelled with messages that resolve that choice. We therefore assume to be given a set L of possible choice labels. The variable l will be used to refer to an element of <sup>L</sup>. <sup>P</sup><sup>+</sup> <sup>&</sup>lt;ℵ<sup>0</sup> (L) is the set of all finite, non-empty, subsets of L. Variables L, L1, L2,... refer to these finite, non-empty subsets of L.

Our goal is to define a polynomial functor [14] that captures the states labels and transitions. This requires some further formal language. First, we let ∗ and d be some fixed, distinct, objects. Second, given sets X and Y , we denote by X<sup>Y</sup> the set of all (total) functions from Y to X. Finally, given a family of sets {Xi}<sup>i</sup>∈<sup>I</sup> indexed by some set I, their coproduct is the set ! <sup>i</sup>∈<sup>I</sup> <sup>X</sup><sup>i</sup> <sup>=</sup> {(i, x) <sup>|</sup> <sup>i</sup> <sup>∈</sup> I,x <sup>∈</sup> <sup>X</sup>i}.

We are now ready to define session coalgebras:

**Definition 3 (Session Coalgebras).** Let A and B be sets defined as follows, where we recall that D is the set of all basic data types.


The polynomial functor F : **Set** → **Set** is defined by

$$F(X) = \coprod\_{a \in A} X^{B\_a}$$

$$F(f)(a, g) = (a, f \circ g)$$

A coalgebra (X, c) for the functor F is called a session coalgebra.

Let us unfold this definition. Given a session coalgebra c : X → F(X) and a state x ∈ X, we find in c(x) ∈ F(X) the information of x encoded as a tuple (a, f) with a ∈ A and f : B<sup>a</sup> → X. From a, we get directly the operation, and the polarity for com states, the type of values communicated for bsc states or the message labels of branch states. The function f encodes the transitions out of x. The domain of f is exactly the set of labels that have a transition, and is dependent on the kind of state declared by a.

It is convenient to partition the domain of the transition map f into data and continuations. Notice how only com states have data transitions, for other states, all transitions are continuations. As usual, we write dom(f) for the domain of f.

**Definition 4 (Domains).** Suppose c(x)=(com, p, f), then the data domain of <sup>f</sup> is domD(f) = {d} and the continuation domain is dom<sup>C</sup> (f) = {∗}. In all other cases, domD(f) = ∅ and dom<sup>C</sup> (f) = dom(f).

### **3.1 Alternative Presentation of Session Coalgebras**

Session coalgebras (X, c) are rather complex. We show how to build up c as the combination of two simpler functions, denoted σ and δ, so that c(x)=(σ(x), δ(x)) with σ : X → A and δ(x): Bσ(x) → X. Observe that every state gets an operation in O assigned, thus we may assume that there is a map op: X → O. Depending on the operation given by op(x), the label on x will then have different other ingredients that are captured in the following proposition.

To formulate the proposition, we need some notation. Suppose f : X → I is a map and <sup>i</sup> <sup>∈</sup> <sup>I</sup>. We define the fibre <sup>X</sup><sup>f</sup> <sup>i</sup> of <sup>f</sup> over <sup>i</sup> to be <sup>X</sup><sup>f</sup> <sup>i</sup> = {x ∈ X | f(x) = i}. Moreover, we let the pairing of functions f and g be f,g(x)=(f(x), g(x)).

**Proposition 1.** A session coalgebra (X, c) can equivalently be expressed by providing the following maps:


where

$$\sigma(x) = \begin{cases} \langle op, pol \rangle(x) & \text{if } op(x) = \text{com} \\ \langle op, pol, la \rangle(x) & \text{if } op(x) = \text{branch} \\ \langle op, da \rangle(x) & \text{if } op(x) = \text{bsc} \\ op(x) & \text{if } op(x) = \text{end or } op(x) = \text{un} \end{cases}$$

We specified δ<sup>a</sup> as a family of transition functions to preserve each specific signature. We can define a single global transition function as δ(x) = δ<sup>σ</sup>(x)(x). This is how the coalgebra finally becomes c(x)=(σ(x), δ(x)). As long as the provided maps fit their signatures, this derived function will conform to c : X → F(X).

The procedure also works backwards: given any session coalgebra, we can derive functions op(x), pol(x), etc. from c(x). We will often use op(x), σ(x), and δ(x) to refer to those specific parts of an arbitrary session coalgebra.

### **3.2 Coalgebra of Session Types**

In Sec. 1, we informally explained how session types can be represented as states of a session coalgebra. We will now justify this claim by showing that session types are, in fact, states of a specific session coalgebra (Type, cType).

We define the functions op, pol, δ, and la (see Prop. 1) on Type. Using Prop. 1, we can then derive cType : Type → F(Type). Let us begin with the linear types.


Under this definition, la(T) is indeed finite, by virtue of an expression being a finite string. The completed protocol end and basic types d are straightforward: c(end)=(end) and c(d)=(bsc, d) for any d ∈ D. Recursive types are handled according to their unfolding, c(μX. T) = c(unfold(μX. T)). Recall that contractivity ensures that unfold always terminates. As our types are closed, all recursion variables are substituted during the unfolding of their binder. Consequently, we do not need to define c on these variables. Also note that this definition results in an equi-recursive interpretation of recursive types.

Session types can also be unrestricted, and consist of a pretype p with a qualifier un. Session coalgebras have un states to mark unrestricted types; the continuation describes what the actual interaction is. Thus, we define op(un p) = un and δ(un p)(∗) = lin p.

Remark 1 (Alternative Syntaxes and their Functors). The unrestricted session types that we have adopted are fairly standard, but they are not the only ones in the literature. Most notably, Gay and Hole [17] defined a type [T1,...,Tn] that allows infinite reading and writing. To allow for such behaviour in session coalgebra, we can change Bun to a set of two elements, such a {∗1, ∗2}. Like internal choice, the two transitions describe an option of which behaviour to follow, but without sending synchronisation signals. One transition could go to a read, and the other to a write, both recursively continuing as the original type [T1,...,Tn].

It is possible, although not entirely trivial, to change the further definitions appropriately and get a decidable type checking algorithm encompassing both the syntax presented in this work, and Gay and Hole's syntax. We choose not to, so to keep the presentation simpler.

# **4 Type Equivalence, Duality and Subtyping**

Up to here, we have represented session types as session coalgebras, but we have not yet given a precise semantics to them. As a first step, we will define three relations on states: bisimulation, duality, and simulation. Bisimulation is also called behavioural equivalence for types; we will show that bisimilar types are indeed equivalent. Duality specifies complementary types: it tells us which types can form a correct interaction. Simulation will provide a notion of subtyping: it tells us when a type can be used where another type was expected. Besides relations on session coalgebras, we also introduce the parallelizability of states that allows us to rule out certain troubling unrestricted types. Finally, we will obtain conditions on coalgebras to ensure the decidability of the three relations and therefore the type system that we derive in Sec. 5.

In the following, we will denote by Rel<sup>X</sup> the poset P(X × X) of all relations on X ordered by inclusion. Recall that a post-fixpoint of a monotone map g : Rel<sup>X</sup> → Rel<sup>X</sup> is a relation R ∈ Rel<sup>X</sup> with R ⊆ g(R). Note that Rel<sup>X</sup> is a complete lattice and that therefore any monotone map has a greatest post-fixpoint by the Knaster-Tarski Theorem [34]. We will define bisimulation, simulation, and duality as the greatest (post-)fixpoint of monotone functions, which we will therefore call coinductive definitions. This definition turns out to be intuitively what we would expect and the interaction of infinite behaviour with other type features is automatically correct. The coinductive definitions also give us immediately proof techniques for equivalence, duality and subtyping: to show that two states are, say, dual we only have to establish a relation that contains both states and show that the relation is a post-fixpoint. This technique can then be improved in various ways [30] and we will show that it is decidable for reasonable session coalgebras.

### **4.1 Bisimulation**

Two states of a coalgebra are said to be bisimilar if they exhibit equivalent behaviour. We abstract away from the precise structure of a coalgebra and only consider its observable behaviour. Two states are bisimilar if their labels are equal and if the states at the end of matching transitions are again bisimilar. There is one exception to the equality of labels: basic types can be related via their pre-order, which does not have to coincide with equality.

Fix some coalgebra (X, c) and let c<sup>∗</sup> : Rel<sup>F</sup> (X) → Rel<sup>X</sup> be the binary preimage of c defined as

$$c^\*(R) = \{(x, y) \mid (c(x), c(y)) \in R\}\ .$$

**Definition 5.** We define the function f<sup>∼</sup> : Rel<sup>X</sup> → Rel<sup>F</sup> (X) as

$$\begin{aligned} f\_{\sim}(R) &= \{ \ ((a, f), (a, f')) \mid (\forall \alpha \in dom(f)) \quad f(\alpha) \ R \ f'(\alpha) \} \\ &\cup \{ \ ((\text{bsc}, d, f\_{\emptyset}), (\text{bsc}, d', f\_{\emptyset})) \mid d \leq\_D d' \land d' \leq\_D d \} \end{aligned}$$

where f<sup>∅</sup> : ∅ → X is the empty function.

It can be easily checked that, both, c<sup>∗</sup> and f<sup>∼</sup> are monotone maps and thus also their composition. Thus, the greatest fixpoint in the following definition exists.

**Definition 6.** A relation R is called a bisimulation if it is a post-fixpoint of c<sup>∗</sup> ◦ f∼. We call the greatest fixpoint bisimilarity and denote it by ∼.

### **4.2 Duality**

Duality describes exactly opposite types in terms of their polarity. That is, the dual of input is output and the dual of output is input: in = out and out = in. We can extend this to tuples a in A, see Def. 3, with the exception of basic types because they do not describe channels:

$$\begin{array}{c} \overline{(\text{com},p)} = (\text{com}, \overline{p}) \\ \overline{(\text{branch},p,L)} = (\text{branch}, \overline{p}, L) \\ \overline{(\text{bsc},d)} \text{ is undefined} \end{array} \qquad \begin{array}{c} \overline{(\text{end})} = (\text{end}) \\ \overline{(\text{un})} = (\text{un}) \end{array}$$

The next step is to compare transitions. Continuations of dom<sup>C</sup> (f) need to be dual. The data types that are sent or received need to be equivalent, hence transitions of domD(f) need to go to bisimilar states. We capture this idea with the monotone map f<sup>⊥</sup> : Rel<sup>X</sup> → Rel<sup>F</sup> (X) defined as follows.

$$f\_{\perp}(R) = \left\{ \left. \left( (a, f), (\overline{a}, f') \right) \right| \begin{matrix} (\forall \alpha \in dom\_{C}(f)) & f(\alpha) \ R \ f'(\alpha) \text{ and } \\ (\forall \beta \in dom\_{D}(f)) & f(\beta) \sim f'(\beta) \end{matrix} \right\}$$

**Definition 7.** A relation R is called a duality relation if it is a post-fixpoint of c<sup>∗</sup> ◦ f⊥. We call the greatest fixpoint duality and denote it by ⊥.

It is useful to have a function mapping any x ∈ X to their dual x, as long as duality is defined on x. However, even if duality is defined on x, the dual state might not be in X. Thus, we define the dual closure of X as the set X<sup>⊥</sup> = X ∪{x | σ(x) is defined}, where x is understood to be an arbitrary state not in X and distinct from y for any states y ∈ X with x = y. For any of the original states, c⊥(x) = c(x), but for the new states we define σ⊥(x) = σ(x) and

$$\begin{array}{l} \delta^{\perp}(\overline{x})(\alpha) = \overline{\delta(x)(\alpha)} \quad \text{for all } \alpha \in dom\_C(f), \text{ and} \\\delta^{\perp}(\overline{x})(\beta) = \delta(x)(\beta) \quad \text{for all } \beta \in dom\_D(f) \end{array}$$

Thus, the dual closure is a coalgebra such that x ⊥ x for any x. Notice that taking a dual twice always yields a bisimilar type, so we can define the duality function as an involution, x = x, rather than adding more variables. Clearly, the dual closure of a finite set is finite.

**Proposition 2.** x ⊥ x for every state x such that x is defined.

### **4.3 Parallelizability**

Unlike a linear endpoint, a channel endpoint with an unrestricted type may be shared between different parallel processes; each of them uses it independently, without informing the others. Furthermore, there is no way to coordinate which process receives which message. If the unrestricted endpoint sends a message, it could be read by a process that just started using the channel, or by a process that is almost done using the channel, or by a process that is anywhere in between.

In practice, this means an unrestricted channel can only perform one kind of communication action. However, session coalgebras allow us to define arbitrarily complex unrestricted types. For example, μX. un ?int. un ?bool. X is an element of Type, but we know that sending both int and bool over the same unrestricted channel causes problems.

**Definition 8.** Given a coalgebra (X, c), some subset Y ⊆ X is parallelizable, written par(Y ), if for every x and y in Y one of the following holds: x ∼ y, σ(x) = un, or σ(y) = un.

We know that un states do not represent communications; any other states, though, have to represent the same kind of action. We make this slightly stronger by requiring they are pairwise bisimilar.

Often we are interested in the parallelizability only of a specific state. Recall that x<sup>c</sup> denotes the subcoalgebra generated by x ∈ X in c.

**Definition 9.** Let x <sup>c</sup> be the smallest subset of x<sup>c</sup> that contains x and is closed under continuation transitions:

$$\langle x \rangle\_c^\rhd = \bigcap \{ Y \subseteq X \mid x \in Y \text{ and } \delta(y)(\alpha) \in Y \text{ for all } y \in Y \text{ and } \alpha \in \text{dom}\_C(\delta(y)) \mid \} $$

A state x is parallelizable, written par(x), if x <sup>c</sup> is parallelizable.

### **4.4 Simulation and Subtyping**

Intuitively, a coalgebra simulates another if the behaviour of the latter "is contained in" the former. Subtyping, originally defined on session types by Gay and Hole [17], is a notion of substitutability of types [16]. We will define our notion of simulation such that it coincides with subtyping, just like bisimulation provides a notion of type equivalence [17].

Consider a process that expects a channel of type T = ?real. The process reads a value, and expects it to be a real number and treat it as such. We defined int as a subtype of real, so the process can operate correctly if it receives an integer instead; that is, ?int is a subtype of T. Now consider a process that expects a channel of type !int, on which it can send any integer. In this case we cannot restrict the channel to a subtype: as all integers are valid where real numbers are expected, we can generalise the channel type to !real.

Now, in the input case the session types are related (in the subtyping relation) in the same order as the data types; this is called covariance. For output, the order is reversed; this is called contravariance. The same idea holds for labelled choices: the subtype of an external choice can have a subset of choices, while the subtype of an internal choice can add more options. For all types, it holds that states reached through transitions are covariant, i.e., if T is a subtype of U, continuations of T must be subtypes of continuations (of the same label) of U. The monotone map h in Fig. 7 captures these ideas formally.

**Definition 10.** A relation R is called a simulation if it is a post-fixpoint of c<sup>∗</sup> ◦ h. We call the greatest fixpoint similarity and denote it by .

$$\begin{array}{lcl} h\_{\subseteq}(R) = \{ \{ (\text{com},in,f), (\text{com},in,g) \} & \{ f(\ast) \text{ } R \text{ } g(\ast) \text{ and } f(d) \text{ } R \text{ } g(d) \} \\ & \cup \{ \{ (\text{com},out,f), (\text{com},out,g) \} \mid f(\ast) \text{ } R \text{ } g(\ast) \text{ and } g(d) \text{ } R \text{ } f(d) \} \\ & \cup \{ \{ (\text{branch},in,L\_1,f), \} & & \\ & (\text{branch},in,L\_2,g) \mid L\_1 \subseteq L\_2 \text{ and } \forall l \in L\_1. \text{ } f(l) \text{ } R \text{ } g(l) \text{ } \} \\ & \cup \{ \{ (\text{branch},out,L\_1,f), \} & & \\ & (\text{branch},out,L\_2,g) \mid L\_2 \subseteq L\_1 \text{ and } \forall l \in L\_2. \text{ } f(l) \text{ } R \text{ } g(l) \} \\ & \cup \{ \{ (\text{branch},d,f\_{\theta}), (\text{bsc},d',f\_{\theta}) \} & & \\ & \cup \{ \{ (\text{end},f\_{\theta}), (\text{end},g\_{\theta}) \} & \\ & \cup \{ \{ (\text{und},f\_{\theta}), (\text{und},g\_{\theta}) \} & & \\ & \cup \{ \{ (\text{und},f\_{\theta}), (\text{und},g\_{\theta}) \} & & \\ \end{array}$$

**Figure 7.** Monotone map h : Rel<sup>X</sup> → Rel<sup>F</sup> (X) that defines simulations

**Figure 8.** Simulation for two mathematical server clients (indicated by dotted arrows)

Let us illustrate similarity by means of an example.

Example 1. Recall the two client protocols for our mathematical server in Figs. 2 and 3. We can now prove our claim that the latter can also connect to the server because it is a supertype of the client protocol in Fig. 2. To do that, we have to establish a simulation relation between the states of both client protocols. In Fig. 8, we display a part of both session coalgebras side-by-side and indicate with dotted arrows the pairs that have to be related by a simulation relation to show that these states are similar, that is, related by . It should be noted that we simulate states from the second coalgebra by that of the first, that is, we show s<sup>k</sup> r<sup>k</sup> for the shown states. There is one exception to this, namely qint qreal.

The following proposition records some properties of and tight connections between the relations that we introduced.

**Proposition 3.** Bisimilarity ∼ is an equivalence relation, duality ⊥ is symmetric, and similarity is a preorder. Moreover, for all states x, y, and z of a session coalgebra, we have that

1. x ∼ y iff x y and y x; 2. x ⊥ y and x ⊥ z implies y ∼ z; and 3. x ⊥ y and y ∼ z implies x ⊥ z .

### **4.5 Decidability**

In a practical type checker, we need an algorithm to decide the relations defined above. In this subsection we show an algorithm that computes the answer in finite time for a certain class of types.

# **Definition 11.** A coalgebra c is finitely generated if x<sup>c</sup> is finite for all x.

This restriction is not problematic for types, as the following lemma shows.

**Lemma 1.** The coalgebra of types (Type, cType) is finitely generated.

To determine whether two states x and y are bisimilar, we need to determine if there exists a bisimulation R with xRy. We start with the simplest relation R = {(x, y)}, and ask if this is a bisimulation.

First, we check that for all (u, w) ∈ R, σ(u) = σ(w), or in the case of bsc states that da(u) ≤<sup>D</sup> da(w) and da(w) ≤<sup>D</sup> da(u). If σ(u) = σ(w) for any pair in R we know that no superset of R is a bisimulation, and the algorithm rejects.

Second, we check the matching transitions. For every (u, w) ∈ R and α ∈ dom(δ(u)) we check whether (δ(u)(α), δ(w)(α)) ∈ R. If we encounter a missing pair, we add it to R and ask whether this new relation is a bisimulation, i.e., return to the first step. If all destinations for matching transitions are present in R, then R is, by construction, a bisimulation containing (x, y). Hence, x ∼ y.

This algorithm tries to construct the smallest possible bisimulation containing (x, y), by only adding strictly necessary pairs. If the algorithm rejects, there is no such bisimulation; hence, x ∼ y. Additionally, the algorithm only examines pairs in x<sup>c</sup> × yc. If there are finitely many of such pairs, the algorithm will terminate in finite time

The above described algorithm can be suitably adapted to similarity and duality, which gives us the following result.

**Theorem 1.** Bisimilarity, similarity, and duality of any states x and y are decidable if x<sup>c</sup> and y<sup>c</sup> are finite. Parallelizability of any state x is decidable if x <sup>c</sup> (Definition 9) is finite.

**Corollary 1.** Bisimilarity, similarity, and duality are decidable for cType.

# **5 Typing Rules**

Session types are meant to discipline the behaviour of the channels of an interacting process, so as to ensure that prescribed protocols are executed as intended. Up to here, we have focused on session types (i.e., their representation as session coalgebras and coinductively-defined relations on them) without committing to a specific syntax for processes. This choice is on purpose: our goal is to provide a truly syntax-independent justification for session types. In this section, we introduce a syntactic notion of processes and rely on session coalgebras to define the typing rules for a session type system.


**Figure 9.** Process syntax

### **5.1 A Session** *π***-calculus**

The π-calculus is a formal model of interactive computation in which processes exchange messages along channels (or names) [26,31]. As such, it is an abstract framework in which key features such as name mobility, (message-passing) concurrency, non-determinism, synchronous communication, and infinite behaviour have rigorous syntactic representations and precise operational meaning. We consider a session π-calculus based on [37,17], i.e., a variant of the π-calculus whose operators are tailored to the protocols expressed by session types.

We assume base sets of variables (x, y, z, . . .) and values (v, v ,...), which can be variables or the Boolean constants (true and false). There is also a set of labels L, ranged over by l,l ,.... The syntax of processes (P, Q, . . .) is given by the grammar in Fig. 9. We discuss the salient aspects of the syntax. A process xy.P denotes the output of channel y along channel x, which precedes the execution of P. Dually, a process x(y).P denotes the input of a value v along channel x, which precedes the execution of process P[v/y], i.e., the process P in which all free occurrences of <sup>y</sup> have been substituted by <sup>v</sup>. Processes <sup>x</sup> {l<sup>i</sup> : <sup>P</sup>i}<sup>i</sup>∈<sup>I</sup> and xl.P implement a labelled choice mechanism. Given a finite index set I, process <sup>x</sup> {l<sup>i</sup> : <sup>P</sup>i}<sup>i</sup>∈<sup>I</sup> , known as branching, denotes an external choice: the reception of a label l<sup>j</sup> (with j ∈ I) along channel x precedes the execution of the continuation P<sup>j</sup> . Process x l.P, known as selection, denotes an internal choice; it is meant to interact with a complementary branching. Given processes P and Q, process P | Q denotes their parallel composition, which enables their simultaneous execution. The process !P, the replication of P, denotes the composition of infinite copies of P running in parallel, i.e., P | P |··· . Process **0** denotes inaction. Finally, process (νxy)P is arguably the main difference with respect to usual presentations of the π-calculus, and denotes a restriction operator that declares x and y as covariables, i.e., as complementary endpoints of the same channel, with scope P.

The operational semantics for processes is defined as a reduction relation denoted −→, by relying on a notion of structural congruence on processes, denoted ≡. Figure 10 defines these two notions. Intuitively, two processes are structurally congruent if they are identical in behaviour, but not necessarily in structure. It is the smallest congruence relation satisfying the axioms in Fig. 10 (bottom). We say a process P reduces to Q, written P −→ Q, when there is a single execution step yielding Q from P. We comment on the rules in Fig. 10 (top). r-com formalizes

### **Reduction**

$$\begin{array}{llll} (\nu xy)(\overline{x}/p, P \mid y(z).Q \mid R) \longrightarrow (\nu xy)(P \mid Q[v/z] \mid R) & \text{[ $\text{$ R-COM $}$ ]}\\ (\nu xy)(x \lhd\_i P \mid y \rhd \{l\_i: Q\_i\}\_{i \in I} \mid R) \longrightarrow (\nu xy)(P \mid Q\_j \mid R) & (j \in I) & \text{[ $\text{$ \text{ $R-SYNC$ } $]}$ }\\ \cline{2-4} \underline{P \longrightarrow Q} & P \longrightarrow Q\\ \underline{P \equiv P' \quad P \longrightarrow Q} \quad Q \equiv Q'\\ \underline{P' \longrightarrow Q'} & \underline{P' \longrightarrow Q}' \end{array}$$

#### **Structural congruence**

Parallel composition: P | Q ≡ Q | P (P | Q) | R ≡ P | (Q | R) P | **0** ≡ P !P ≡ P |!P Scope restriction: (νxy)(νvw)P ≡ (νvw)(νxy)P (νxy)**0** ≡ **0** (νxy)P ≡ (νyx)P (νxy)(P | Q) ≡ ((νxy)P) | Q if x and y not free in Q

**Figure 10.** Reduction semantics

the exchange a value over a channel formed by two covariables. Similarly, r-sync formalises the synchronisation between a branching and a selection that realises the labelled choice. Rules r-res and r-par are contextual rules, which allow reduction to proceed under restriction and parallel composition. Finally, Rule r-cong says that reduction is closed under structurally congruence: we can use ≡ to promote interactions that match the structure of the rules above.

### **5.2 Typing Rules**

Based on the above, variables P, Q will refer to processes, x, y, z will range over channels and T, U, V are states of some fixed, but arbitrary, session coalgebra (X, c). Variables are associated with these states in a context Γ, as described by Γ ::= ∅ | Γ, x : T . A context is an unordered, finite set of pairs, that may have at most one pair (x, T) for each variable x. A context is thus isomorphic to a (partial) function from a finite set of variables to their types. We use Γ to denote this isomorphic function as well: Γ(x) = T if (x, T) ∈ Γ. The domain of a context is defined accordingly.

We know 'un' types are unrestricted, but they are not the only ones.

**Definition 12.** A type is unrestricted, written un(T), if its operation is un, end or bsc. A context is unrestricted, written un(Γ), if all types in Γ are unrestricted, i.e., if (x, T) ∈ Γ implies un(T). A type is linear, written lin(T), if it is not unrestricted. A context is linear, if all its types are linear.

A context Γ may be split into two parts Γ<sup>1</sup> and Γ2, such that the linear types are strictly divided between Γ<sup>1</sup> and Γ2, but unrestricted types can be copied. Context split is a ternary relation, defined by the axioms in Fig. 11. We may write Γ<sup>1</sup> ◦ Γ<sup>2</sup> to refer to a context Γ for which Γ = Γ<sup>1</sup> ◦ Γ<sup>2</sup> is in the context split relation. Such a context is not necessarily defined for any given contexts;

$$\begin{array}{cc} \emptyset = \emptyset \circ \emptyset & \frac{\varGamma = \varGamma\_1 \circ \varGamma\_2 & \mathsf{un}(T)}{\varGamma, x:T = (\varGamma\_1, x:T) \circ (\varGamma\_2, x:T)}\\ \varGamma = \varGamma\_1 \circ \varGamma\_2 & \frac{\varGamma = \varGamma\_1 \circ \varGamma\_2}{\varGamma, x:T = (\varGamma\_1, x:T) \circ \varGamma\_2} \end{array}$$

**Figure 11.** Context Split

$$\begin{array}{c c c} \text{um} (\varGamma) & \varGamma, x:T, y:U \vdash P & T \perp U \\ \hline \varGamma \vdash \mathbf{0} & \varGamma \vdash (\nu xy)P \end{array} \tag{\text{ $\mathbf{T}$ -INSTr}[\mathbf{T}\text{-Res}]}$$

$$\begin{array}{ccc} \begin{array}{c} \begin{array}{c} \Gamma\_{1} \vdash P \end{array} & \Gamma\_{2} \vdash Q \\ \hline \end{array} & \begin{array}{c} \Gamma\_{2} \vdash P \end{array} & \begin{array}{c} \begin{array}{c} \Gamma \vdash P \end{array} & \text{un}(\Gamma) \\ \hline \end{array} \\ \begin{array}{c} \Gamma \vdash !P \end{array} & \begin{array}{c} \text{un}(\Gamma) \\ \hline \end{array} & \begin{array}{c} \text{[T-PR][T-R\text{EP}]} \end{array} \end{array}$$

$$\frac{c(T) = (?, f) \qquad \Gamma, \, y: U, \, x: f(\*) \vdash P \qquad f(d) \sqsubseteq U}{\Gamma, x: T \vdash x(y).P} \tag{\text{T-IN}}$$

$$\frac{c(T) = (!, f) \qquad \Gamma, x: f(\ast) \vdash P \qquad U \sqsubseteq f(d)}{\Gamma, x: T, y: U \vdash \overline{x} \langle y \rangle. P} \tag{\text{\$\!T-Out\!}}$$

$$\frac{c(T) = (\&, L\_1, f) \qquad L\_1 \subseteq L\_2 \qquad \Gamma, x: f(l) \vdash P l \qquad \forall l \in L\_1}{\Gamma, x: T \vdash x \rhd \{l: P\}\_{l \in L\_2}} \qquad \text{[T-BRANCH]}$$

$$\frac{c(T) = (\oplus, L, f) \qquad \Gamma, x: f(l) \vdash P\_l \qquad l \in L}{\Gamma, x: T \vdash x \lhd l. P\_l} \tag{T-\text{Sen}}$$

$$\frac{c(T) = (\text{un}, f) \qquad \text{par}(T) \qquad I, x:f(\ast) \vdash P}{\Gamma, x:T \vdash P} \tag{\text{T-UNPACK}}$$

we implicitly assume its existence when writing Γ<sup>1</sup> ◦ Γ2. Notice that the use of Γ, x : T in the third rule of Fig. 11 carries the assumption that x not in Γ. Otherwise, Γ, x : T would have two pairs with x, which is not allowed.

The type system is defined by the rules in Fig. 12. A process P is well-formed, under a context Γ, if there is some inference tree whose root is Γ P and whose nodes are all valid instantiations of these type rules. As T-Inact is the only rule that does not depend on the correctness of another process, it forms the leaves of such trees. For well-formed processes, the type system guarantees that:


We discuss the typing rules, which can be conveniently read keeping in mind the notations introduced in Def. 3 and Prop. 1. T-Inact ensures that all linear channels in the context are interacted with until the type becomes unrestricted. If our context contains a variable x of type ?int, then the process is required to read an int from it. Thus, x : ?int. **0**. In contrast, process x(z).**0** is well-formed

for the same context, using T-Inact and T-In:

$$\begin{array}{c} \hline x:\textbf{end},z:\textbf{int}\vdash\textbf{0} \\ \hline x:\textbf{?int}\vdash x(z).\textbf{0} \\ \hline \end{array}$$

T-Res creates a channel by binding together two covariables x and y, of dual type. T-Par causes unrestricted channels to be copied and linear channels to get split between composite processes, ensuring the latter occur in only a single process. Recall that replication !P is an infinite composition of a single process P, hence, a replicated process can only use unrestricted channels. Together, T-Par and T-Res allow us to introduce new covariables, with new types, and distribute them. But, only unrestricted types may be copied. Notice that a process does not specify which types to give the newly bound variables.

$$\begin{array}{rcl} v: \textbf{int} & \vdash & (\nu xy) \, x(z).\textbf{0} \mid \overline{y}\langle v\rangle.\textbf{0} \\ x: \textbf{un?int} & \vdash & x(z).\textbf{0} \mid x(z).\textbf{0} \\ x: \textbf{?int} & \vdash & x(z).\textbf{0} \mid x(z).\textbf{0} \end{array}$$

Each action on a channel has its own rule: T-In handles input, binding the channel x to the continuation type and y to some supertype of the received type. T-Out handles output, which requires the sent variable to have a subtype of whatever type the channel expects to send. T-Branch handles external choice, where the process needs to offer at least all choices the type describes, coupled with processes that are correctly typed under the respective continuation types. T-Sel only checks whether the single label that was chosen by the process was a valid option, and if the rest of the process is correct under the continuation type.

These rules are only specified for linear states; T-Unpack allows a un state to be used as if it was the underlying type, as long as it is parallelizable (Def. 8).

We can actually create structures with un that do not have a syntactical equivalent. For example, let Tend be a state with σ(Tend) = un and δ(Tend)(∗) = Tend. Just like regular end, Tend allows no interactions on the channel, but it does not cause a "un" type to be unparallelizable.

**Figure 13.** Session coalgebra using an alternative completed protocol

The diagram in Fig. 13 describes a parallelizable unrestricted state T such that each copy of a channel in state T can only do a single receive. However, because it is unrestricted, we can still copy the channel across threads and read a value

per copy. We can even read infinitely many values through replication.

$$\begin{array}{ccccc} x:T & \vdash & x(y\_1).x(y\_2).x(y\_3).\mathbf{0} \\ x:T & \vdash & x(y\_1).\mathbf{0} & \mid x(y\_2).\mathbf{0} \mid x(y\_3).\mathbf{0} \\ x:T & \vdash & !(x(y).\mathbf{0}) \end{array}$$

Such a type might be interesting in combination with session delegation. A linear session could be established by receiving a channel from an unrestricted channel. By using a structure like T, each thread is guaranteed to establish at most one private session, but there can be many of such sessions in parallel threads.

In Sec. 4, we defined simulation through the intuition of subtyping as substitutability in one direction. We see that substitution is indeed allowed for simulated types.

**Theorem 2.** The following, more common, rule is admissible from the rules in Fig. 12.

$$\frac{\Gamma, x:T \vdash P \qquad U \sqsubseteq T}{\Gamma, x:U \vdash P}$$

That is, we could add the rule as an axiom, without changing the set of typable processes. As a corollary, bisimulation of states implies the states are equivalent with respect to the type system.

**Corollary 2.** For all bisimilar types T ∼ U, contexts Γ and processes P, it holds that Γ, x : T P if and only if Γ, x : U P.

# **6 Algorithmic Type Checking**

The type rules describe what well-formed processes look like, but do not directly allow us to decide whether an arbitrary process is well-formed or not. This is because, beforehand, we do not know:


Rather than trying to infer the introduced types, we augment the language of processes with type annotations:

$$P ::= \dots \mid (\nu xy : T) \, P \mid x(y : T) . P$$

We only need to annotate one type for scope restrictions, as we can create the other with the duality function. Other productions are kept unchanged.

When checking a process P | Q, we pass along the entire context to P, keeping track of all linear variables used, and remove those from the context given to Q. To do this we add an output to the algorithm: in an execution Γ<sup>1</sup> P ; Γ2, output Γ<sup>2</sup> is the subset of Γ<sup>1</sup> containing only those variables of the input which

$$\Gamma \div \emptyset = \Gamma \qquad \frac{\Gamma\_1 \div F = \Gamma\_2, x:T \quad \text{un} \langle T \rangle}{\Gamma\_1 \div \langle F, x \rangle = \Gamma\_2} \qquad \frac{\Gamma\_1 \div F = \Gamma\_2 \quad x \notin \text{dom}(\Gamma\_2)}{\Gamma\_1 \div \langle F, x \rangle = \Gamma\_2}$$

**Figure 14.** Context Difference

$$\begin{array}{lll} I\vdash \mathbf{0}; I & I\_{1} \vdash P; I\_{2} & I\_{1} = I\_{2} \\ \hline & I\_{1} \vdash P ; I\_{2} & I\_{2} \vdash Q ; I\_{2} \\ \hline & I\_{1} \vdash P \mid Q ; \; \; \; \; \mathbf{?} \end{array} \quad \begin{array}{lll} \text{[A-InAC][A-\text{REP}]} \text{[A-\text{REP}]} \\ \hline \\ \Gamma\_{1} \vdash P \mid Q ; \; \; \; \Gamma\_{3} \end{array} \quad \begin{array}{lll} \Gamma\_{1}, x : T, y : \overline{T} \vdash P ; \; \!\!\! \\ \hline \Gamma\_{1} \vdash (\nu xy : T)P ; \; \; \; \Gamma\_{2} \vdash \{x, y\} \end{array} \quad \begin{array}{lll} \text{[A-\text{R}R][A-\text{RES}]} \\ \hline \end{array} \quad \begin{array}{lll} \text{[A-\text{R}R][A-\text{RES}]} \\ \hline \end{array} \quad \begin{array}{lll} \text{[A-\text{R}R][A-\text{RES}]} \\ \hline \end{array} \quad \begin{array}{lll} \text{[A-\text{R}R][A-\text{RES}]} \\ \hline \end{array}$$

$$\begin{array}{lll} c(T) = \begin{pmatrix} ?, f \end{pmatrix} \quad U \sqsubseteq f(d) \quad \; \; \Gamma\_{1}, x : f(\ast) \vdash P ; \; \; \!\!\! \\ \end{array} \quad \begin{array}{lll} \text{[A-\text{N}R][A-\text{RES}]} \\ \hline \end{array}$$

$$\frac{c(I) = (!, f) \quad \quad U \sqsubseteq f(\not{a}) \quad \quad I\_1, \; x:f(\*) \vdash P \; ; \; I\_2}{\Gamma\_1, x:T, y:U \vdash \overline{x} \langle y \rangle. P \; ; \; I\_2 \doteq \{x\}} \tag{\text{A-Ovt}}$$

[A-Branch]

$$\frac{c(T) = (\&, L\_1, f) \quad L\_1 \subseteq L\_2 \quad \Gamma\_1, x: f(l) \vdash P\_l \; ; \; \Gamma\_l \quad \Gamma\_2 = \; \Gamma\_l \doteq \{x\} \quad \forall l \in L\_2}{\Gamma\_1, x: T \vdash x \rhd \{l: P\_l\}\_{l \in L\_2} \; ; \; \; \Gamma\_2}$$

$$\frac{c(T) = (\oplus, L, f) \quad \Gamma\_1, x: f(l) \vdash P\_l \; ; \; I\_2 \quad l \in L \quad \text{ $l \in L$ } \tag{A-Sɛz} $$

$$\frac{c(T) = (\text{un}, f) \qquad \text{par}(T) \quad \varGamma\_1, x: f(\ast) \Vdash P; \varGamma\_2}{\varGamma\_1, x: T \vdash P \; ; \ (\varGamma\_2 \div \{x\}), x: T} \qquad\qquad\qquad \qquad \quad [\text{A-UNPACK}]$$

#### **Figure 15.** Algorithmic Type Checking Rules

had unrestricted types or were not used in P. We say subset because we want these variables, if present, to have the same type in Γ<sup>2</sup> as in Γ1.

Figure 15 lists the algorithmic versions of the type rules. A-Par, for example, checks parallel processes as described. By construction, Γ<sup>2</sup> is one part of the context split required to instantiate T-Par. The linear variables of the other part is exactly those which are present in Γ<sup>1</sup> but not in Γ2. This change in A-Par requires adjusting the other rules. Firstly, we need the algorithm to accept even when we do not fully complete all sessions of Γ<sup>1</sup> in P. We do this by unconditionally accepting the terminated process. Note that acceptance of the algorithm now only implies well-formedness if the returned context is unrestricted.

Secondly, the algorithm needs to remove linear variables from the output as we use them. We do not, however, want to remove any variable that has a linear type, as that would allow us to accept processes which do not complete all linear sessions. Thus, we introduce the context difference operator ÷ in Fig. 14. Γ ÷ {x} is the context of all variable/type pairs in Γ minus a potential pair including x, but is only defined if (x, T) ∈ Γ implies that T is unrestricted.

We elaborate on A-Branch; the algorithm is called once for every branch, yielding a context Γ<sup>l</sup> each time. Excluding x, each branch must use the exact same set of linear variables. Thus, we require that all these contexts are equal up to a potential (x, Ul) pair. Specifically, there is some Γ<sup>2</sup> such that Γ<sup>2</sup> = Γ<sup>l</sup> ÷ {x} for any l ∈ L2, this Γ<sup>2</sup> is the output context.

To motivate this, consider a type T = &{a : Tun, b : end}, where Tun is some unrestricted type distinct from end, and some process <sup>P</sup> <sup>=</sup> <sup>x</sup> {<sup>a</sup> : **<sup>0</sup>**, b : **<sup>0</sup>**}. Let Γ be some unrestricted context, **0** is well-formed for both Γ, x : Tun and Γ, x : end; the algorithm agrees.

$$\begin{aligned} &\Gamma, x:T\_{un} \vdash \mathbf{0} \; ; (\Gamma, x:T\_{un}) \\ &\Gamma, x:\text{end} \vdash \mathbf{0} \; ; (\Gamma, x:\text{end}) \end{aligned}$$

The resulting contexts are not equal. P is well-formed for Γ, so we have to allow x to have different types in the output of different branches in a complete algorithm. A-In, A-Out, and A-Sel do not have multiple branches to check, but the ideas are similar. When introducing a new variable, either through a read or scope restriction, the new variable is also removed from the output. A-Unpack only unpacks unrestricted types. We want those to have the same type in the input as in the output, so we remove the variable and add a pair with the original type.

Take, for example, the process

$$x \colon \mathbf{?int}, \ y \colon \mathbf{?int} \quad \vdash \quad x(z\_1) . \mathbf{0} \mid y(z\_2) . \mathbf{0}$$

The variables are split correctly, and both split contexts are unrestricted when the process is completed, thus it is well-formed.

If, on the other hand, the left process did not complete the linear session, then the context difference would not have been defined. Take one such process:

$$x: \texttt{?int.?int.} \text{, } y: \texttt{?int.} \quad \forall \quad x(z\_1). \mathbf{0} \mid y(z\_2). \mathbf{0}$$

We succeed in checking the terminated process of the left part.

$$x \mathrel{\ ? \ ? \ \mathsf{int}, \ y \mathrel{\ ? \ \mathsf{int}} \ \vdash \ \mathsf{0} \ \mathsf{} ; \quad (x \mathrel{\ ? \ \mathsf{int}, \ y \mathrel{\ ? \ \mathsf{int}}) \ \mathsf{int})$$

But x has a linear type in the output. (x : ?int, y : ?int) ÷ {x} is undefined, so the algorithm rejects this input entirely. The process was indeed not well-formed, and no further parallel processes could fix it; the rejection is expected.

For each process and context there is at most one applicable algorithmic rule: which one is directed by the process syntax and unrestrictedness of a channel being interacted with.

Under the same assumptions as before (i.e., the session coalgebra describing the types is finitely generated), this induced type checking algorithm is decidable, sound, and complete with respect to the type rules defined in Sec. 5.

**Theorem 3 (Decidability).** The type checking algorithm terminates in finite time for every input, assuming a finitely generated session coalgebra.

Having defined algorithmic typechecking, we can go back to the language that we used to define our typing rules by erasing type annotations in input and restriction operators. Let erase(·) denote a function on processes defined as

$$\begin{aligned} \operatorname{erase}((\nu xy : T).Q) &= (\nu xy).error(Q) \\ \operatorname{erase}(x(y : T).Q) &= x(y).error(Q) \end{aligned}$$

and as an homomorphism on the remaining process constructs. We have:

**Theorem 4 (Correctness).** For any context Γ and annotated process P, Γ<sup>1</sup> erase(P) iff Γ<sup>1</sup> P; Γ<sup>2</sup> and un(Γ2).

# **7 Concluding Remarks**

We have developed a new, language-independent foundation for session types by relying on coalgebras. We introduced session coalgebras, which elegantly capture all communication structures of session types, both linear and unrestricted, without committing to a specific syntactic formulation for processes and types. Session coalgebras allow us to rediscover language-independent coinductive definitions for duality, subtyping, and type equivalence. A key idea is to assimilate channel types to the states of a session coalgebra; we demonstrated this insight by deriving a session type system for the π-calculus, which revisits and extends that by Vasconcelos [37], unlocking decidability results and algorithmic type checking.

Interesting strands for future work include extending our coalgebraic toolbox so as to give a language-independent justification to advanced session type systems, such as context-free session types [35] and multiparty session types [21]. Another line concerns extending our coalgebraic view to include language-dependent issues and properties that require a global analysis on session behaviours. Salient examples are liveness properties such as (dead)lock-freedom and progress: advanced type systems [23,29,28,8] typically couple (session) types with advanced mechanisms (such as priority-based annotations and strict partial orders), which provide a global insight to rule out the circular dependencies between sessions that are at the heart of stuck processes. Lastly, the whole area of coalgebra now becomes available to explore session types. One possible direction is to make use of final coalgebras and modal logic, which would allow us to analyse the behaviour of session coalgebras. This would be particularly powerful in combination with composition operations for session coalgebras to break down protocols and type checking. Another direction is to use session coalgebras to verify other coalgebras that take on the role of the syntactic π-calculus [12,27] and thereby allowing also for the exploration of other semantics like manifest sharing [1,2] without resorting to a specific syntax.

Acknowledgements We are grateful to the anonymous reviewers for their useful remarks and suggestions. P´erez has been partially supported by the Dutch Research Council (NWO) under project No. 016.Vidi.189.046 (Unifying Correctness for Communicating Software).

# **References**


Revised Selected Papers. Lecture Notes in Computer Science, vol. 7176, pp. 2–16. Springer (2011). https://doi.org/10.1007/978-3-642-29834-9 2


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Correctness of Sequential Monte Carlo Inference for Probabilistic Programming Languages**

Daniel Lund´en(-)<sup>1</sup> , Johannes Borgstr¨om<sup>2</sup> , and David Broman<sup>1</sup>

> <sup>1</sup> Digital Futures and EECS, KTH Royal Institute of Technology, Stockholm, Sweden {dlunde,dbro}@kth.se <sup>2</sup> Uppsala University, Uppsala, Sweden johannes.borgstrom@it.uu.se

**Abstract.** Probabilistic programming is an approach to reasoning under uncertainty by encoding inference problems as programs. In order to solve these inference problems, probabilistic programming languages (PPLs) employ different inference algorithms, such as sequential Monte Carlo (SMC), Markov chain Monte Carlo (MCMC), or variational methods. Existing research on such algorithms mainly concerns their implementation and efficiency, rather than the correctness of the algorithms themselves when applied in the context of expressive PPLs. To remedy this, we give a correctness proof for SMC methods in the context of an expressive PPL calculus, representative of popular PPLs such as WebPPL, Anglican, and Birch. Previous work have studied correctness of MCMC using an operational semantics, and correctness of SMC and MCMC in a denotational setting without term recursion. However, for SMC inference—one of the most commonly used algorithms in PPLs as of today—no formal correctness proof exists in an operational setting. In particular, an open question is if the resample locations in a probabilistic program affects the correctness of SMC. We solve this fundamental problem, and make four novel contributions: (i) we extend an untyped PPL lambda calculus and operational semantics to include explicit resample terms, expressing synchronization points in SMC inference; (ii) we prove, for the first time, that subject to mild restrictions, any placement of the explicit resample terms is valid for a generic form of SMC inference; (iii) as a result of (ii), our calculus benefits from classic results from the SMC literature: a law of large numbers and an unbiased estimate of the model evidence; and (iv) we formalize the bootstrap particle filter for the calculus and discuss how our results can be further extended to other SMC algorithms.

**Keywords:** Probabilistic Programming · Sequential Monte Carlo · Operational Semantics · Functional Programming · Measure Theory

This project is financially supported by the Swedish Foundation for Strategic Research (ASSEMBLE RIT15-0012) and the Swedish Research Council (grant 2013- 4853).

c The Author(s) 2021

N. Yoshida (Ed.): ESOP 2021, LNCS 12648, pp. 404–431, 2021. https://doi.org/10.1007/978-3-030-72019-3 15

# **1 Introduction**

Probabilistic programming is a programming paradigm for probabilistic models, encompassing a wide range of programming languages, libraries, and platforms [5,13,14,25,32,37,38]. Such probabilistic models are typically created to express inference problems, which are ubiquitous and highly significant in, for instance, machine learning [1], artificial intelligence [31], phylogenetics [29,30], and topic modeling [2].

In order to solve such inference problems, an inference algorithm is required. Common general-purpose algorithm choices for inference problems include sequential Monte Carlo (SMC) methods [9], Markov chain Monte Carlo (MCMC) methods [12], and variational methods [42]. In traditional settings, correctness results for such algorithms often come in the form of laws of large numbers, central limit theorems, or optimality arguments. However, for general-purpose probabilistic programming languages (PPLs), the emphasis has predominantly been on algorithm implementations and their efficiency [14,25,37], rather than the correctness of the algorithms themselves. In particular, explicit connections between traditional theoretical SMC results and PPL semantics have been limited. In this paper, we bridge this gap by formally connecting fundamental SMC results to the context of an expressive PPL calculus.

Essentially, SMC works by simulating many executions of a probabilistic program concurrently, occasionally resampling the different executions. In this resampling step, SMC discards less likely executions, and replicates more likely executions, while remembering the average likelihood at each resampling step in order to estimate the overall likelihood. In expressive PPLs, there is freedom in choosing where in a program this resampling occurs. For example, most SMC implementations, such as WebPPL [14], Anglican [43], and Birch [25], always resample when all executions have reached a call to the weighting construct in the language. At possible resampling locations, Anglican takes a conservative approach by dynamically checking during runtime if all executions have either stopped at a weighting construct, or all have finished. If none of these two cases apply, report a runtime error. In contrast, WebPPL does not perform any checks and simply includes the executions that have finished in the resampling step. There are also heuristic approaches [21] that automatically align resampling locations in programs, ensuring that all executions finish after encountering the same number of them. The motivations for using the above approaches are all based on experimental validation. As such, an open research problem is whether there are any inherent restrictions when selecting resampling locations, or if the correctness of SMC is independent of this selection. This is not only important theoretically to guarantee the correctness of inference results, but also for inference performance, both since inference performance is affected by the locations of resampling locations [21] and since dynamic checks result in direct runtime overhead. We address this research problem in this paper.

In the following, we give an overview of the paper and our contributions. In Section 2, we begin by giving a motivating example from phylogenetics, illustrating the usefulness of our results. Next, in Section 3, we define the syntax and operational semantics of an expressive functional PPL calculus based on the operational formalization in Borgstr¨om et al. [3], representative of common PPLs. The operational semantics assign to each pair of term **t** and initial random trace (sequences of random samples) a non-negative weight. This weight is accumulated during evaluation through a weight construct, which, in current calculi and implementations of SMC, is (implicitly) always followed by a resampling. To decouple resampling from weighting, we present our first contribution.

(i) We extend the calculus from Borgstr¨om et al. [3] to include explicit resample terms, expressing explicit synchronization points for performing resampling in SMC. With this extension, we also define a semantics which limits the number of evaluated resample terms, laying the foundation for the remaining contributions.

In Section 4, we define the probabilistic semantics of the calculus. The weight from the operational semantics is used to define unnormalized distributions **t** over traces and **<sup>t</sup>** over result terms. The measure **<sup>t</sup>** is called the target measure, and finding a representation of this is the main objective of inference algorithms.

We give a formal definition of SMC inference based on Chopin [6] in Section 5. This includes both a generic SMC algorithm, and two standard correctness results from the SMC literature: a law of large numbers [6], and the unbiasedness of the likelihood estimate [26].

In Section 6, we proceed to present the main contributions.

(ii) From the SMC formulation by Chopin [6], we formalize a sequence of distributions **t**n, indexed by n, such that **t**<sup>n</sup> allows for evaluating at most n resamples. This sequence is determined by the placement of resamples in **t**. Our first result is Theorem 1, showing that **t**<sup>n</sup> eventually equals **t** if the number of calls to resample is upper bounded. Because of the explicit resample construct, this also implies that, for all resample placements such that the number of calls to resample is upper bounded, **t**<sup>n</sup> eventually equals **t**. We further relax the finite upper bound restriction and investigate under which conditions lim<sup>n</sup>→∞**t**<sup>n</sup> = **t** pointwise. In particular, we relate this equality to the dominated convergence theorem in Theorem 2, which states that the limit converges as long as there exists a function dominating the weights encountered during evaluation. This gives an alternative set of conditions under which **t**<sup>n</sup> converges to **t** (now asymptotically, in the number of resamplings n).

The contribution is fundamental, in that it provides us with a sequence of approximating distributions **t**<sup>n</sup> of **t** that can be targeted by the SMC algorithm of Section 5. As a consequence, we can extend the standard correctness results of that section to our calculus. This is our next contribution.

(iii) Given a suitable sequence of transition kernels (ways of moving between the **t**n), we can correctly approximate **t**<sup>n</sup> with the SMC algorithm from Section 5. The approximation is correct in the sense of Section 5: the law of large numbers and the unbiasedness of the likelihood estimate holds. As a consequence of (ii), SMC also correctly approximates **t**, and in turn the target measure **<sup>t</sup>**. Crucially, this also means estimating the model evidence (likelihood), which allows for compositionality [15] and comparisons between different models [30]. This contribution is summarized in Theorem 3.

Related to the above contributions, Scibior et al. [ ´ 33] formalizes SMC and MCMC inference as transformations over monadic inference representations using a denotational approach (in contrast to our operational approach). They prove that their SMC transformations preserve the measure of the initial representation of the program (i.e., the target measure). Furthermore, their formalization is based on a simply-typed lambda calculus with primitive recursion, while our formalization is based on an untyped lambda calculus which naturally supports full term recursion. Our approach is also rather more elementary, only requiring basic measure theory compared to the relatively heavy mathematics (category theory and synthetic measure theory) used by them. Regarding generalizability, their approach is both general and compositional in the different inference transformations, while we abstract over parts of the SMC algorithm. This allows us, in particular, to relate directly to standard SMC correctness results.

Section 7 concerns the instantiation of the transition kernels from (iii), and also discusses other SMC algorithms. Our last contribution is the following.

(iv) We define a sequence of sub-probability kernels k**<sup>t</sup>**,n induced by a given program **t**, corresponding to the fundamental SMC algorithm known as the bootstrap particle filter (BPF) for our calculus. This is the most common version of SMC, and we present a concrete SMC algorithm corresponding to these kernels. We also discuss other SMC algorithms and their relation to our formalization: the resample-move [11], alive [19], and auxiliary [28] particle filters.

Importantly, by combining the above contributions, we justify that the implementation strategies of the BPFs in WebPPL, Anglican, and Birch are indeed correct. In fact, our results show that the strategy in Anglican, in which every evaluation path must resample the same number of times, is too conservative.

An extended version of this paper is also available [20]. This extended version includes rigorous definitions and detailed proofs for many lemmas found in the paper, as well as further examples and comments. The lemmas proved in the extended version are explicitly marked with †.

# **2 A Motivating Example from Phylogenetics**

In this section, we give a motivating example from phylogenetics. The example is written in a functional PPL<sup>3</sup> developed as part of this paper, in order to verify

<sup>3</sup> The implementation is an interpreter written in OCaml. It largely follows the same approach as Anglican and WebPPL, and uses continuation-passing style in order to

```
1 let tree = {
2 left:{left:{age:0},right:{age:0},age:4},
3 right:{left:{age:0},right:{age:0},age:6},
4 age:10
5 } in
6
7 let lambda = 0.2 in let mu = 0.1 in
8
9 let crbdGoesExtinct startTime =
10 let curTime = startTime
11 - (sample (exponential (lambda + mu)))
12 in
13 if curTime < 0 then false
14 else
15 let speciation = sample
16 (bernoulli (lambda / (lambda + mu))) in
17 if !speciation then true
18 else crbdGoesExtinct curTime
19 && crbdGoesExtinct curTime in
20
                                               21 let simBranch startTime stopTime =
                                               22 let curTime = startTime -
                                               23 sample (exponential lambda) in
                                               24 if curTime < stopTime then ()
                                               25 else if not (crbdGoesExtinct curTime)
                                               26 then weight (log 0) // #1
                                               27 else (weight (log 2); // #2
                                               28 simBranch curTime stopTime) in
                                               29
                                               30 let simTree tree parent =
                                               31 let w = -mu * (parent.age - tree.age) in
                                               32 weight w; // #3
                                               33 simBranch parent.age tree.age;
                                               34 match tree with
                                               35 | {left,right,age} ->
                                               36 simTree left tree; simTree right tree
                                               37 | {age} -> () in
                                               38
                                               39 simTree tree.left tree;
                                               40 simTree tree.right tree
```
Fig. 1: A simplified version of a phylogenetic birth-death model from [30]. See the text for a description.

and experiment with the presented concepts and results. In particular, this PPL supports SMC inference (Algorithm 2) with decoupled resamples and weights<sup>4</sup>, as well as sampling from random distributions with a sample construct.

Consider the program in Fig. 1, encoding a simplified version of a phylogenetic birth-death model (see Ronquist et al. [30] for the full version). The problem is to find the model evidence for a particular birth rate (lambda = 0.2) and death rate (mu = 0.1), given an observed phylogenetic tree. The tree represents known lineages of evolution, where the leaves are extant (surviving to the present) species. Most importantly, for illustrating the usefulness of the results in this paper, the recursive function simBranch, with its two weight applications #1 and #2, is called a random number of times for each branch in the observed tree. Thus, different SMC executions encounter differing numbers of calls to weight. When resampling is performed after every call to weight (#1, #2, and #3), it is, because of the differing numbers of resamples, not obvious that inference is correct (e.g., the equivalent program in Anglican gives a runtime error). Our results show that such a resampling strategy is indeed correct.

This strategy is far from optimal, however. For instance, only resampling at #3, which is encountered the same number of times in each execution, performs much better [21,30]. Our results show that this is correct as well, and that it gives the same asymptotic results as the naive strategy in the previous paragraph.

Another strategy is to resample only at #1 and #3, again causing executions to encounter differing numbers of resamples. Because #1 weights with (log) 0, this

pause and resume executions as part of inference. It is available at https://github. com/miking-lang/miking-dppl/tree/pplcore. The example in Fig. 1 can be found under examples/crbd/crbd-esop.ppl

<sup>4</sup> The implementation uses log weights as arguments to weight for numerical reasons.

approach gives the same accuracy as resampling only at #3, but avoids useless computation since a zero-weight execution can never obtain non-zero weight. Equivalently to resampling at #1, zero-weight executions can also be identified and stopped automatically at runtime. This gives a direct performance gain, and both are correct by our results. We compared the three strategies above for SMC inference with 50 000 particles5: resampling at #1,#2, and #3 resulted in a runtime of 15.0 seconds, at #3 in a runtime of 12.6 seconds, and at #1 and #3 in a runtime of 11.2 seconds. Furthermore, resampling at #1,#2, and #3 resulted in significantly worse accuracy compared to the other two strategies [21,30].

Summarizing the above, the results in this paper ensure correctness when exploring different resampling placement strategies. As just demonstrated, this is useful, because resampling strategies can have a large impact on SMC accuracy and performance.

# **3 A Calculus for Probabilistic Programming Languages**

In this section, we define the calculus used throughout the paper. In Section 3.1, we begin by defining the syntax, and demonstrate how a simple probability distribution can be encoded using it. In Section 3.2, we define the semantics and demonstrate it on the previously encoded probability distribution. This semantics is used in Section 4 to define the target measure for any given program. In Section 3.3, we extend the semantics of Section 3.2 to limit the number of allowed resamples in an evaluation. This extended semantics forms the foundation for formalizing SMC in Sections 6 and 7.

### **3.1 Syntax**

The main difference between the calculus presented in this section and the standard untyped lambda calculus is the addition of real numbers, functions operating on real numbers, a sampling construct for drawing random values from real-valued probability distributions, and a construct for weighting executions. The rationale for making these additions is that, in addition to discrete probability distributions, continuous distributions are ubiquitous in most real-world models, and the weighting construct is essential for encoding inference problems. In order to define the calculus, we let X be a countable set of variable names; <sup>D</sup> <sup>∈</sup> <sup>D</sup> range over a countable set <sup>D</sup> of identifiers for families of probability distributions over R, where the family for each identifier D has a fixed number of real parameters <sup>|</sup>D|; and <sup>g</sup> <sup>∈</sup> <sup>G</sup> range over a countable set <sup>G</sup> of identifiers for real-valued functions with respective arities |g|. More precisely, for each g, there is a measurable function <sup>σ</sup><sup>g</sup> : <sup>R</sup>|g<sup>|</sup> <sup>→</sup> <sup>R</sup>. For simplicity, we often use <sup>g</sup> to denote both the identifier and its measurable function. We can now give an inductive definition of the abstract syntax, consisting of values **v** and terms **t**.

<sup>5</sup> We repeated each experiment 20 times on a machine running Ubuntu 20.04 with an Intel i5-2500K CPU (4 cores) and 8GB memory. The standard deviation was under 0.1 seconds in all three cases.

$$\begin{array}{rcl} \mathtt{let}.p & = \mathtt{sample}\_{Beta}(2,2) & \mathtt{in} & \stackrel{\scriptstyle\triangleright}{\frac{\nu}{u}}\_{u}2 & \stackrel{\scriptstyle\triangleright}{\frac{\nu}{u}}\_{u}2 \\ \mathtt{sample}\_{Beta}(2,2) & & \mathtt{let}.\operatorname{ observe } o = & \mathtt{if}.\operatorname{"{a}lder}\_{\text{Gen}}(p,o) \text{\"{a}} \\ & & \mathtt{weight}\{f\_{Bern}(p,o)\} & \mathtt{in} & \stackrel{\scriptstyle\triangleright}{\frac{\nu}{u}}\_{0}2 & \stackrel{\scriptstyle\triangleright}{\frac{\nu}{u}}\_{0}2 \\ \mathtt{(a)} & & \mathtt{iter} & \operatorname{observed}\ \mathtt{\mathtt{f}}\mathtt{true},\operatorname{false},\operatorname{true}\end{array}$$

Fig. 2: The Beta(2, 2) distribution as a program in (a), and visualized with a solid line in (c). Also, the program **t**obs in (b), visualized with a dashed line in (c). The iter function in (b) simply maps the given function over the given list and returns (). That is, it calls observe true, observe false, and observe true purely for the side-effect of weighting.

### **Definition 1.**

$$\mathbf{v} ::= c \mid \lambda x.\mathbf{t} \qquad \begin{array}{l} \mathbf{t} ::= \mathbf{v} \mid x \mid \mathbf{t} \; \mathbf{t} \; \mid \; \mathbf{if} \; \mathbf{t} \; \mathbf{then} \; \mathbf{t} \; \mathbf{else} \; \mathbf{t} \mid g(\mathbf{t}\_1, \dots, \mathbf{t}\_{|g|}) \; \vert \; \; \; \mathbf{false} \; \vert \; \; \mathbf{t} \; \; \mathbf{else} \; \mathbf{else} \\\ \mid \; \; \mathbf{s} \mathtt{m} \mathbf{1} \mathtt{e}\_D(\mathbf{t}\_1, \dots, \mathbf{t}\_{|D|}) \mid \; \mathbf{weight}(\mathbf{t}) \; \mid \; \mathbf{resample} \end{array} (1)$$

Here, <sup>c</sup> <sup>∈</sup> <sup>R</sup>, <sup>x</sup> <sup>∈</sup> <sup>X</sup>, <sup>D</sup> <sup>∈</sup> <sup>D</sup>, <sup>g</sup> <sup>∈</sup> <sup>G</sup>. We denote the set of all terms by <sup>T</sup> and the set of all values by V.

The formal semantics is given in Section 3.2. Here, we instead give an informal description of the various language constructs.

Some examples of distribution identifiers are N ∈ <sup>D</sup>, the identifier for the family of normal distributions, and U ∈ <sup>D</sup>, the identifier for the family of continuous uniform distributions. The semantics of the term sample<sup>N</sup> (0, 1) is, informally, "draw a random sample from the normal distribution with mean 0 and variance 1". The weight construct is illustrated later in this section, and we discuss the resample construct in detail in Sections 3.3 and 6.

We use common syntactic sugar throughout the paper. Most importantly, we use false and true as aliases for 0 and 1, respectively, and () (unit) as another alias for 0. Furthermore, we often write <sup>g</sup> <sup>∈</sup> <sup>G</sup> as infix operators. For instance, 1+ 2 is a valid term, where + <sup>∈</sup> <sup>G</sup>. Now, let <sup>R</sup><sup>+</sup> denote the non-negative reals. We define <sup>f</sup><sup>D</sup> : <sup>R</sup>|D|+1 <sup>→</sup> <sup>R</sup><sup>+</sup> as the function <sup>f</sup><sup>D</sup> <sup>∈</sup> <sup>G</sup> such that <sup>f</sup>D(c1,...,c|D|, ·) is the probability density (continuous distribution) or mass function (discrete distribution) for the probability distribution corresponding to <sup>D</sup> <sup>∈</sup> <sup>D</sup> and (c1,...,c|D|). For example, f<sup>N</sup> (0, 1, x) = <sup>√</sup> 1 <sup>2</sup><sup>π</sup> · <sup>e</sup><sup>−</sup> <sup>1</sup> <sup>2</sup> ·x<sup>2</sup> is the standard probability density of the normal distribution with mean 0 and variance 1. Lastly, we will also use let bindings, let rec bindings, sequencing using ;, and lists (all of which can be encoded in the calculus). Sequencing is required for the side-effects produced by weight (see Definition 5) and resample (see Sections 3.3 and 6).

We now consider an example. In Sections 3.2 and 4.3 this example will be further considered to illustrate the semantics, and target measure, respectively. Here, we first give the syntax, and informally visualize the probability distributions (i.e., the target measures, as we will see in Section 4.3) for the example. Consider first the program in Fig. 2a, directly encoding the Beta(2, 2) distribution, illustrated in Fig. 2c. This distribution naturally represents the uncertainty in the bias of a coin—in this case, the coin is most likely unbiased (bias 0.5), and biases closer to 0 and 1 are less likely. In Fig. 2b, we extend Fig. 2a by observing the sequence [true, false, true] when flipping the coin. These observations are encoded using the weight construct, which simply accumulates a product (as a side-effect) of all real-valued arguments given to it throughout the execution. First, recall the standard mass function (σ<sup>f</sup>Bern (p, true) = p; σ<sup>f</sup>Bern (p, false) = (1 − p); σ<sup>f</sup>Bern (p, x) = 0 otherwise) for the Bernoulli distribution corresponding to <sup>f</sup>Bern <sup>∈</sup> <sup>G</sup>. The observations [true, false, true] are encoded using the observe function, which uses the weight construct internally to assign weights to the current value p according to the Bernoulli mass function. As an example, assume we have drawn p = 0.4. The weight for this execution is <sup>σ</sup><sup>f</sup>Bern (0.4, true) · <sup>σ</sup><sup>f</sup>Bern (0.4, false) · <sup>σ</sup><sup>f</sup>Bern (0.4, true)=0.4<sup>2</sup> · <sup>0</sup>.6. Now consider <sup>p</sup> = 0.6 instead. For this value of <sup>p</sup> the weight is instead 0.6<sup>2</sup> · <sup>0</sup>.4. This explains the shift in Fig. 2c—a bias closer to 1 is more likely, since we have observed two true flips, but only one false.

### **3.2 Semantics**

In this section, we define the semantics of our calculus. The definition is split into two parts: a deterministic semantics and a stochastic semantics. We use evaluation contexts to assist in defining our semantics. The evaluation contexts **E** induce a call-by-value semantics, and are defined as follows.

### **Definition 2.**

$$\begin{aligned} \mathbf{E} & \coloneqq \begin{array}{c} \left[ \cdot \right] \mid \mathbf{E} \ \mathbf{t} \mid (\lambda x. \mathbf{t}) \; \mathbf{E} \mid \text{if } \mathbf{E} \ \mathbf{then} \ \mathbf{t} \ \mathbf{else} \ \mathbf{t} \\\\ g(c\_1, \ldots, c\_m, \mathbf{E}, \mathbf{t}\_{m+2}, \ldots, \mathbf{t}\_{|g|}) \\\ \mid \mathbf{samp1} \mathbf{e}\_D(c\_1, \ldots, c\_m, \mathbf{E}, \mathbf{t}\_{m+2}, \ldots, \mathbf{t}\_{|D|}) \mid \mathbf{weight}(\mathbf{E}) \end{array} \tag{2}$$

We denote the set of all evaluation contexts by E.

With the evaluation contexts in place, we proceed to define the deterministic semantics through a small-step relation →Det.

### **Definition 3.**

$$\begin{array}{ll} \overline{\mathbf{E}[(\lambda x.\mathbf{t}) \ \mathbf{v}] \rightarrow\_{\text{DET}} \mathbf{E}[[x \mapsto \mathbf{v}]\mathbf{t}]} \text{ (App} & \frac{c = \sigma\_{g}(c\_{1}, \dots, c\_{|g|})}{\mathbf{E}[g(c\_{1}, \dots, c\_{|g|})] \rightarrow\_{\text{DFT}} \mathbf{E}[c]} \text{(PRIM)} \\\\ & \overline{\mathbf{E}[\text{if } true \text{ then } \mathbf{t}\_{1} \ \mathbf{e} \text{1} \ \mathbf{se} \ \mathbf{t}\_{2}] \rightarrow\_{\text{DFT}} \mathbf{E}[\mathbf{t}\_{1}]} \text{(IFTRUE)} \\\\ & \overline{\mathbf{E}[\text{if } false \text{ then } \mathbf{t}\_{1} \ \mathbf{e} \text{1} \ \mathbf{se} \ \mathbf{t}\_{2}] \rightarrow\_{\text{DFT}} \mathbf{E}[\mathbf{t}\_{2}]} \text{(IFFASE)} \end{array} \tag{3}$$

The rules are straightforward, and will not be discussed in further detail here. We use the standard notation for transitive and reflexive closures (e.g. →<sup>∗</sup> Det), and transitive closures (e.g. <sup>→</sup><sup>+</sup> Det) of relations throughout the paper.

Following the tradition of Kozen [18] and Park et al. [27], sampling in our stochastic semantics works by consuming randomness from a tape of real numbers. We use inverse transform sampling, and therefore the tape consists of numbers from the interval [0, 1]. In order to use inverse transform sampling, we require that for each <sup>D</sup> <sup>∈</sup> <sup>D</sup>, there exists a measurable function <sup>F</sup> <sup>−</sup><sup>1</sup> <sup>D</sup> : <sup>R</sup>|D<sup>|</sup> <sup>×</sup> [0, 1] <sup>→</sup> <sup>R</sup>, such that <sup>F</sup> <sup>−</sup><sup>1</sup> <sup>D</sup> (c1,...,c|D|, ·) is the inverse cumulative distribution function for the probability distribution corresponding to D and (c1,...,c|D|). We call the tape of real numbers a trace, and make the following definition.

**Definition 4.** Let <sup>N</sup><sup>0</sup> <sup>=</sup> <sup>N</sup> ∪ {0}. The set of all traces is <sup>S</sup> <sup>=</sup> - <sup>n</sup>∈N<sup>0</sup> [0, 1]<sup>n</sup>.

We use the notation (c1, c2,...,cn)<sup>S</sup> to indicate the trace consisting of the n numbers c1, c2,...,cn. Given a trace s, we denote by |s| the length of the trace. We also denote the concatenation of two traces s and s with s ∗ s . Lastly, we let c :: s denote the extension of the trace s with the real number c as head.

With the traces and F <sup>−</sup><sup>1</sup> <sup>D</sup> defined, we can proceed to the stochastic<sup>6</sup> semantics <sup>→</sup> over <sup>T</sup> <sup>×</sup> <sup>R</sup><sup>+</sup> <sup>×</sup> <sup>S</sup>.

### **Definition 5.**

$$\mathbf{t}\_{stop} ::= \mathbf{v} \mid \mathbf{E}[\mathtt{samp1e}\_D(c\_1, \dots, c\_{|D|})] \mid \mid \mathbf{E}[\mathtt{weight}(c)] \mid \mid \mathbf{E}[\mathtt{resamp1e}] \quad (4)$$

$$\frac{\mathbf{t} \to \mathbf{t}\_{DET}^+ \mathbf{t}\_{stop}}{\mathbf{t}, w, s \to \mathbf{t}\_{stop}, w, s} (\mathtt{DET}) \; \frac{c \ge 0}{\mathbf{E}[\mathtt{weight}(c)], w, s \to \mathbf{E}[( )], w \cdot c, s} (\mathtt{WEIGHT})$$

$$\frac{c = F\_D^{-1}(c\_1, \dots, c\_{|D|}, p)}{\mathbf{E}[\mathtt{sample}\_D(c\_1, \dots, c\_{|D|})], w, p \coloneqq s \to \mathbf{E}[c], w, s} (\mathtt{SAMPLE}) \tag{5}$$

$$\overline{\mathbf{E}[\mathtt{resamp1e}]}, w, s \to \mathbf{E}[()], w, s}$$

The rule (Det) encapsulates the <sup>→</sup>Det relation, and states that terms can move deterministically only to terms of the form **t**stop. Note that terms of the form **t**stop are found at the left-hand side in the other rules. The (Sample) rule describes how random values are drawn from the inverse cumulative distribution functions and the trace when terms of the form sampleD(c1,...,c|D|) are encountered. Similarly, the Weight rule determines how the weight is updated when weight(c) terms are encountered. Finally, the resample construct always evaluates to unit, and is therefore meaningless from the perspective of this semantics. We elaborate on the role of the resample construct in Section 3.3.

With the semantics in place, we define two important functions over S for a given term. In the below definition, assume that a fixed term **t** is given.

### **Definition 6.**

$$r\_{\mathbf{t}}(s) = \begin{cases} \mathbf{v} & \text{if } \mathbf{t}, 1, s \to^\* \mathbf{v}, w, \text{()}\_{\mathbb{S}} \\ \underline{\qquad} \quad & \text{otherwise} \end{cases} \qquad f\_{\mathbf{t}}(s) = \begin{cases} w & \text{if } \mathbf{t}, 1, s \to^\* \mathbf{v}, w, \text{()}\_{\mathbb{S}} \\ 0 & \text{otherwise} \end{cases} \tag{6}$$

<sup>6</sup> Note that the semantics models stochastic behavior, but is itself a deterministic relation.

Intuitively, r**t** is the function returning the result value after having repeatedly applied → on the initial trace s. Analogously, f**<sup>t</sup>** gives the density or weight of a particular s. Note that, if (**t**, 1, s) gets stuck or diverges, the result value is (), and the weight is 0. In other words, we disregard such traces entirely, since we are in practice only interested in probability distributions over values. Furthermore, note that if the final s = ()S, the value and weight are again () and 0, respectively. The motivation for this is discussed in Section 4.3.

To illustrate r**t**, f**t**, and the weight construct, consider the program **t**obs in Fig. 2b, and the singleton trace (0.8)S. This program will, in total, evaluate one call to sample, and three calls to weight. Now, let h(c) = F <sup>−</sup><sup>1</sup> Beta(2, 2, c) and recall the function σ<sup>f</sup>Bern from Section 3.1. Using the notation φ(c, x) = σ<sup>f</sup>Bern (h(c), x), we have, for some evaluation contexts **E**1, **E**2, **E**3, **E**4,

$$\begin{split} &\mathbf{t}\_{obs}, 1, \langle 0.8 \rangle\_{\mathbb{S}} = \mathbf{E}\_{1}[\mathtt{samp1e}\_{Beta}(2,2)], 1, \langle 0.8 \rangle\_{\mathbb{S}} \to \mathbf{E}\_{1}[h(0.8)], 1, \langle )\_{\mathbb{S}} \\ &\to \mathbf{E}\_{2}[\mathtt{weight}(\phi(0.8, true))], 1, \langle )\_{\mathbb{S}} \to \mathbf{E}\_{2}[()], \phi(0.8, true), \langle \rangle\_{\mathbb{S}} \\ &= \mathbf{E}\_{2}[()], h(0.8), ()\_{\mathbb{S}} \to \prescript{+}{}{\mathbf{E}}\_{3}[()], \phi(0.8, false) \cdot h(0.8), \langle \rangle\_{\mathbb{S}} \\ &\to \prescript{+}{}{\mathbf{E}}\_{4}[()], \phi(0.8, true) \cdot (1 - h(0.8)) \cdot h(0.8), \langle \rangle\_{\mathbb{S}} \\ &\to \prescript{+}{}{h}(0.8), h(0.8) \cdot (1 - h(0.8)) \cdot h(0.8), \langle \rangle\_{\mathbb{S}}. \end{split} \tag{7}$$

That is, <sup>r</sup>**<sup>t</sup>**obs ((0.8)S) = <sup>h</sup>(0.8) and <sup>f</sup>**<sup>t</sup>**obs ((0.8)S) = <sup>h</sup>(0.8)<sup>2</sup>(1 <sup>−</sup> <sup>h</sup>(0.8)). For arbitrary <sup>c</sup>, we see that <sup>r</sup>**<sup>t</sup>**obs ((c)S) = <sup>h</sup>(c) and <sup>f</sup>**<sup>t</sup>**obs ((c)S) = <sup>h</sup>(c)<sup>2</sup>(1 <sup>−</sup> <sup>h</sup>(c)). For any other trace s with |s| = 1, r**<sup>t</sup>**obs (s) = () and f**<sup>t</sup>**obs (s) = 0. We will apply this result when reconsidering this example in Section 4.3.

### **3.3 Resampling Semantics**

In order to connect SMC in PPLs to the classical formalization of SMC presented in Section 5—and thus enabling the theoretical treatments in Sections 6 and 7 we need a relation in which terms "stop" after a certain number n of encountered resample terms. In this section, we define such a relation, denoted by \$→. Its definition is given below.

### **Definition 7.**

$$\begin{aligned} \frac{\mathbf{t} \neq \mathbf{E}[\mathtt{resamp1e}] \quad \mathbf{t}, w, s \to \mathbf{t}', w', s'}{\mathbf{t}, w, s, n \to \mathbf{t}', w', s', n} & \text{(SToCH-FN)}\\ \frac{\mathbf{t} \, n > 0 & \mathbf{E}[\mathtt{resamp1e}], w, s \to \mathbf{E}[\mathbf{()}], w, s}{\mathbf{E}[\mathtt{resamp1e}], w, s, n \to \mathbf{E}[\mathbf{()}], w, s, n - 1} & \text{(RESAMPLE-FN)} \end{aligned} \tag{8}$$

This relation is → extended with a natural number n, indicating how many further resample terms can be evaluated. We implement this limitation by replacing the rule (Resample) of <sup>→</sup> with (Resample-Fin) of \$<sup>→</sup> above which decrements n each time it is applied, causing terms to get stuck at the n + 1th resample encountered.

Now, assume that a fixed term **t** is given. We define r**<sup>t</sup>**,n and f**<sup>t</sup>**,n similar to r**<sup>t</sup>** and f**t**.

$$\textbf{Definition 8.}\quad r\_{\textbf{t},n}(s) = \begin{cases} \textbf{v} & \text{if } \textbf{t}, 1, s, n \hookrightarrow^\* \textbf{v}, w, (\text{)}\_{\mathbb{S}}, n'\\ \textbf{E}[\textbf{res}\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\mathbf{\beta}}}}}}}}}}}}}}}}}}}}}}}}} \,} \,} \,} \,} \,} \,} \,} \,} \,} \,} \,] \,] \,]} \,]} \,] \,]} \,] \,]} \,] \,]} \,] \,]} \,]} \,] \,]} \,] \,]} \,]$$

$$\text{Definition 9. } f\_{\mathbf{t},n}(s) = \begin{cases} w & \text{if } \mathbf{t}, 1, s, n \stackrel{\scriptstyle \mathbf{t}}{\longleftrightarrow} \mathbf{v}, w, \text{()}\_{\mathbb{S}}, n' \\ w & \text{if } \mathbf{t}, 1, s, n \stackrel{\scriptstyle \mathbf{t}}{\longleftrightarrow} \mathbf{^{\bullet}} \mathbf{E}[\mathbf{r} \mathtt{es} \mathbf{a} \mathbf{p} \mathtt{1} \mathbf{e}], w, \text{()}\_{\mathbb{S}}, 0 \\ 0 & \text{otherwise} \end{cases}$$

As for r**<sup>t</sup>** and f**t**, these functions return the result value and weight, respectively, after having repeatedly applied \$→ on the initial trace s. There is one difference compared to →: besides values, we now also allow stopping with non-zero weight at terms of the form **E**[resample].

To illustrate \$→, r**<sup>t</sup>**,n(s), and f**<sup>t</sup>**,n(s), consider the term **t**seq defined by

$$\begin{array}{lcl}\textbf{1et}\ \mathit{observable}\ \boldsymbol{x}\ \boldsymbol{o} = \texttt{weight}(f\_{\mathcal{N}}(\boldsymbol{x}, \boldsymbol{4}, \boldsymbol{o}));\ \texttt{presample}\ \textbf{in} \\\textbf{1et}\ \mathit{sim}\ \boldsymbol{x}\_{n-1}\ \boldsymbol{o}\_{n} =\\\textbf{1et}\ \boldsymbol{x}\_{n} = \texttt{sample}\boldsymbol{\mathit{e}}\_{\mathcal{N}}(\boldsymbol{x}\_{n-1} + \boldsymbol{2}, \boldsymbol{1})\ \textbf{in}\ \boldsymbol{o}\ \mathit{service}\ \boldsymbol{x}\_{n}\ \boldsymbol{o}\_{n};\ \boldsymbol{x}\_{n}\ \textbf{in} \\\textbf{1et}\ \boldsymbol{x}\_{0} = \texttt{sample}\boldsymbol{\mathit{e}}\_{\mathcal{N}}(\boldsymbol{0}, \boldsymbol{100})\ \textbf{in} \\\textbf{1et}\ \boldsymbol{f} = \mathit{fold}\ \mathit{sim}\ \textbf{in}\ \boldsymbol{f}\ \boldsymbol{x}\_{0}\ \textbf{[\$c\$,c\$,}\ldots,\boldsymbol{c}\_{t-1},\boldsymbol{c}\_{t}\].\end{array} \tag{9}$$

This term encodes a model in which an object moves along a real-valued axis in discrete time steps, but where the actual positions (x1, x2, . . . ) can only be observed through a noisy sensor (c1, c2, . . . ). The inference problem consists of finding the probability distribution for the very last position, xt, given all collected observations (c1, c2, ..., ct). Most importantly, note the position of resample in (9)—it is evaluated just after evaluating weight in every folding step. Because of this, for n<t and all traces s such that f**<sup>t</sup>**seq ,n(s) > 0, we have r**<sup>t</sup>**seq ,n(s) = **E**<sup>n</sup> seq[resample; xn], where **E**<sup>n</sup> seq <sup>=</sup> <sup>f</sup> [·] [c<sup>n</sup>+1, c<sup>n</sup>+2,...,c<sup>t</sup>−<sup>1</sup>, ct] and where x<sup>n</sup> is the value sampled in sim at the nth folding step. That is, we can now "stop" evaluation at resamples. We will revisit this example in Section 6.

# **4 The Target Measure of a Program**

In this section, we define the target measure induced by any given program in our calculus. We assume basic familiarity with measure theory, Lebesgue integration, and Borel spaces. McDonald and Weiss [23] provide a pedagogical introduction to the subject. In order to define the target measure of a program as a Lebesgue integral (Section 4.3), we require a measure space on traces (Section 4.1), and a measurable space on terms (Section 4.2). For illustration, we derive the target measure for the example program from Section 3 in Section 4.3. The concepts presented in this section are quite standard, and experienced readers might want to quickly skim it, or even skip it entirely.

### **4.1 A Measure Space over Traces**

We use a standard measure space over traces of samples [22]. First, we define a measurable space over traces. We denote the Borel <sup>σ</sup>-algebra on <sup>R</sup><sup>n</sup> with <sup>B</sup><sup>n</sup>, and the Borel <sup>σ</sup>-algebra on [0, 1] with <sup>B</sup><sup>n</sup> [0,1].

**Definition 10.** The <sup>σ</sup>-algebra <sup>S</sup> on <sup>S</sup> is the <sup>σ</sup>-algebra consisting of sets of the form S = - <sup>n</sup>∈N<sup>0</sup> <sup>B</sup><sup>n</sup> with <sup>B</sup><sup>n</sup> ∈ B<sup>n</sup> [0,1]. Naturally, [0, 1]<sup>0</sup> is the singleton set containing the empty trace. In other words, ([0, 1]0, <sup>B</sup><sup>0</sup> [0,1])=({()S}, {{()S}, ∅}), where ()<sup>S</sup> denotes the empty trace.

**Lemma 1.** (S, <sup>S</sup>) is a measurable space.†

The most common measure on <sup>B</sup><sup>n</sup> is the <sup>n</sup>-dimensional Lebesgue measure, denoted λn. For n = 0, we let λ<sup>0</sup> = δ()<sup>S</sup> , where δ denotes the standard Dirac measure. By combining the Lebesgue measures for each n, we construct a measure <sup>μ</sup><sup>S</sup> over (S, <sup>S</sup>).

**Definition 11.** μS(S) = μ<sup>S</sup> - <sup>n</sup>∈N<sup>0</sup> <sup>B</sup><sup>n</sup> = <sup>n</sup>∈N<sup>0</sup> <sup>λ</sup>n(Bn)

**Lemma 2.** (S, <sup>S</sup>, μS) is a measure space. Furthermore, <sup>μ</sup><sup>S</sup> is <sup>σ</sup>-finite.†

A comment on notation: we denote universal sets by blackboard bold capital letters (e.g., <sup>S</sup>), <sup>σ</sup>-algebras by calligraphic capital letters (e.g., <sup>S</sup>), members of σ-algebras by capital letters (e.g., S), and individual elements by lower case letters (e.g., s).

### **4.2 A Measurable Space over Terms**

In order to show that r**<sup>t</sup>** is measurable, we need a measurable space over terms. We let (T, <sup>T</sup> ) denote the measurable space that we seek to construct, and follow the approach in Staton et al. [35] and V´ak´ar et al. [39]. Because our calculus includes the reals, we would like to at least have B⊂T . Furthermore, we would also like to extend the Borel measurable sets <sup>B</sup><sup>n</sup> to terms with <sup>n</sup> reals as subterms. For instance, we want sets of the form {(λx. (λy. x + y) c2) c<sup>1</sup> | (c1, c2) ∈ B2} to be measurable, where <sup>B</sup><sup>2</sup> ∈ B<sup>2</sup>. This leads us to consider terms in a language in which constants (i.e., reals) are replaced with placeholders [·].

**Definition 12.** Let **v**<sup>p</sup> ::= [·] | λx.**t** replace the values **v** from Definition 1. The set of all terms in the resulting new calculus is denoted with Tp.

Most importantly, it is easy to verify that T<sup>p</sup> is countable. Next, we make the following definitions.

**Definition 13.** For <sup>n</sup> <sup>∈</sup> <sup>N</sup>0, we denote by <sup>T</sup><sup>n</sup> <sup>p</sup> <sup>⊂</sup> <sup>T</sup><sup>p</sup> the set of all terms with exactly n placeholders.

**Definition 14.** We let **t**<sup>n</sup> <sup>p</sup> range over the elements of T<sup>n</sup> <sup>p</sup> . The **t**<sup>n</sup> <sup>p</sup> can be regarded as functions **t**<sup>n</sup> <sup>p</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> **<sup>t</sup>**<sup>n</sup> <sup>p</sup> (Rn) which replaces the n placeholders with the n reals given as arguments.

**Definition 15.** T**t**<sup>n</sup> <sup>p</sup> <sup>=</sup> {**t**<sup>n</sup> <sup>p</sup> (Bn) <sup>|</sup> <sup>B</sup><sup>n</sup> ∈ B<sup>n</sup>}.

From the above definitions, we construct the required σ-algebra T .

**Definition 16.** The <sup>σ</sup>-algebra <sup>T</sup> on <sup>T</sup> is the <sup>σ</sup>-algebra consisting of sets of the form T = - n∈N<sup>0</sup> - **t**n <sup>p</sup> <sup>∈</sup>T<sup>n</sup> p **t**n <sup>p</sup> (Bn).

**Lemma 3.** (T, <sup>T</sup> ) is a measurable space.†

### **4.3 The Target Measure**

We are now in a position to define the target measure. We will first give the formal definitions, and then illustrate the definitions with an example. The definitions rely on the following result.

**Lemma 4.** <sup>r</sup>**<sup>t</sup>** : (S, <sup>S</sup>) <sup>→</sup> (T, <sup>T</sup> ) and <sup>f</sup>**<sup>t</sup>** : (S, <sup>S</sup>) <sup>→</sup> (R+, <sup>B</sup>+) are measurable.†

We can now proceed to define the measure **t** over <sup>S</sup> induced by a term **<sup>t</sup>** using Lebesgue integration.

**Definition 17. t**(S) = <sup>+</sup> <sup>S</sup> f**t**(s) dμS(s)

Using Definition 17 and the measurability of r**t**, we can also define a corresponding pushforward measure **<sup>t</sup>** over <sup>T</sup>.

**Definition 18. <sup>t</sup>**(T) = **t**(r−<sup>1</sup> **<sup>t</sup>** (T)) = + r−<sup>1</sup> **<sup>t</sup>** (T) <sup>f</sup>**t**(s) dμS(s).

The measure **<sup>t</sup>** is our target measure, i.e., the measure encoded by our program that we are interested in.

Let us now consider the target measure for the program given by **t**obs . It is not too difficult to show that **<sup>t</sup>**obs (T) = <sup>+</sup> <sup>T</sup> <sup>∩</sup><sup>R</sup> <sup>c</sup><sup>3</sup>(1 <sup>−</sup> <sup>c</sup>)<sup>2</sup> dλ(c). We recognize the integrand as the density for the Beta(4, 3) distribution, which, as expected, is exactly the graph shown in Fig. 2c.

We should in some way ensure the target measure is finite (i.e., can be normalized to a probability measure), since we are in the end most often only interested in probability measures. Unfortunately, as observed by Staton [34], there is no known useful syntactic restriction that enforces finite measures in PPLs while still admitting weights > 1. We will discuss this further in Section 6.2 in relation to SMC in our calculus.

Lastly, from Section 3.2, recall that we disallow non-empty final traces in f**<sup>t</sup>** and r**t**. We see here why this is needed: if allowed, for every trace s with f**t**(s) > 0, all extensions s ∗ s have the same density f**t**(s ∗ s ) = f**t**(s) > 0. From this, it is easy to check that if **<sup>t</sup>** = 0 (the zero measure), then **<sup>t</sup>**(T) = <sup>∞</sup> (i.e., the measure is not finite). In fact, for any <sup>T</sup> ∈ T , **<sup>t</sup>**(T) <sup>&</sup>gt; 0 =<sup>⇒</sup> **<sup>t</sup>**(T) = <sup>∞</sup>. Clearly, this is not a useful target measure.

# **5 Formal SMC**

In this section, we give a generic formalization of SMC based on Chopin [6]. We assume a basic understanding of SMC. For a complete introduction to SMC, we recommend Naesseth et al. [26] and Doucet and Johansen [10].

First, in Section 5.1, we introduce transition kernels, which is a fundamental concept used in the remaining sections of the paper. Second, in Section 5.2, we describe Chopin's generic formalization of SMC as an algorithm for approximating a sequence of distributions based on a sequence of approximating transition kernels. Lastly, in Section 5.3, we give standard correctness results for the algorithm.

### **5.1 Preliminaries: Transition Kernels**

Intuitively, transition kernels describe how elements move between measurable spaces. For a more comprehensive introduction, see V´ak´ar and Ong [40].

**Definition 19.** Let (A, <sup>A</sup>) and (A , A ) be measurable spaces, and let B<sup>∗</sup> <sup>+</sup> = {B | <sup>B</sup> \ {∞} ∈ B+}. A function <sup>k</sup> : <sup>A</sup> × A <sup>→</sup> <sup>R</sup><sup>∗</sup> <sup>+</sup> is a (transition) kernel if (1) for all <sup>a</sup> <sup>∈</sup> <sup>A</sup>, <sup>k</sup>(a, ·) : <sup>A</sup> <sup>→</sup> <sup>R</sup><sup>∗</sup> <sup>+</sup> is a measure on A , and (2) for all A ∈ A , k(·, A ):(A, <sup>A</sup>) <sup>→</sup> (R<sup>∗</sup> <sup>+</sup>, B<sup>∗</sup> <sup>+</sup>) is measurable.

Additionally, we can classify transition kernels according to the below definition.

**Definition 20.** Let (A, <sup>A</sup>) and (A , A ) be measurable spaces. A kernel <sup>k</sup> : <sup>A</sup> <sup>×</sup> <sup>A</sup> <sup>→</sup> <sup>R</sup><sup>∗</sup> <sup>+</sup> is a sub-probability kernel if k(a, ·) is a sub-probability measure for all <sup>a</sup> <sup>∈</sup> <sup>A</sup>; a probability kernel if <sup>k</sup>(a, ·) is a probability measure for all <sup>a</sup> <sup>∈</sup> <sup>A</sup>; and a finite kernel if sup<sup>a</sup>∈<sup>A</sup> <sup>k</sup>(a, <sup>A</sup> ) < ∞.

### **5.2 Algorithm**

The starting point in Chopin's formulation of SMC is a sequence of probability measures <sup>π</sup><sup>n</sup> (over respective measurable spaces (An, <sup>A</sup>n), with <sup>n</sup> <sup>∈</sup> <sup>N</sup>0) that are difficult or impossible to directly draw samples from.

The SMC approach is to generate samples from the π<sup>n</sup> by first sampling from a sequence of proposal measures qn, and then correcting for the discrepancy between these measures by weighting the proposal samples. The proposal distributions are generated from an initial measure q<sup>0</sup> and a sequence of transition kernels <sup>k</sup><sup>n</sup> : <sup>A</sup><sup>n</sup>−<sup>1</sup> × A<sup>n</sup> <sup>→</sup> [0, 1], n <sup>∈</sup> <sup>N</sup> as

$$q\_n(A\_n) = \int\_{\Lambda\_{n-1}} k\_n(a\_{n-1}, A\_n) \, d\pi\_{n-1}(a\_{n-1}).\tag{10}$$

In order to approximate π<sup>n</sup> by weighting samples from qn, we need some way of obtaining the appropriate weights. Hence, we require each measurable space (An, <sup>A</sup>n) to have a default <sup>σ</sup>-finite measure <sup>μ</sup><sup>A</sup><sup>n</sup> , and the measures <sup>π</sup><sup>n</sup> and <sup>q</sup><sup>n</sup> to **Algorithm 1** A generic formulation of sequential Monte Carlo inference based on Chopin [6]. In each step, we let 1 ≤ j ≤ J, where J is the number of samples.

	- The empirical distribution given by {(a<sup>j</sup> n, w<sup>j</sup> n)}<sup>J</sup> <sup>j</sup>=1 approximates πn.

have densities f<sup>π</sup><sup>n</sup> and f<sup>q</sup><sup>n</sup> with respect to this default measure. Furthermore, we require that the functions f<sup>π</sup><sup>n</sup> and f<sup>q</sup><sup>n</sup> can be efficiently computed pointwise, up to an unknown constant factor per function and value of n. More precisely, we can efficiently compute the densities <sup>f</sup><sup>π</sup> <sup>n</sup> <sup>=</sup> <sup>Z</sup><sup>π</sup> <sup>n</sup> · <sup>f</sup><sup>π</sup><sup>n</sup> and <sup>f</sup><sup>q</sup> <sup>n</sup> <sup>=</sup> <sup>Z</sup><sup>q</sup> <sup>n</sup> · <sup>f</sup><sup>q</sup><sup>n</sup> , corresponding to the unnormalized measures <sup>π</sup>-<sup>n</sup> <sup>=</sup> <sup>Z</sup><sup>π</sup> <sup>n</sup> · <sup>π</sup><sup>n</sup> and <sup>q</sup>-<sup>n</sup> <sup>=</sup> <sup>Z</sup><sup>q</sup> <sup>n</sup> · <sup>q</sup>n. Here, <sup>Z</sup><sup>π</sup> <sup>n</sup> <sup>=</sup> <sup>π</sup>-<sup>n</sup>(An) <sup>∈</sup> <sup>R</sup><sup>+</sup> and <sup>Z</sup><sup>q</sup> <sup>n</sup> <sup>=</sup> <sup>q</sup>-<sup>n</sup>(An) <sup>∈</sup> <sup>R</sup><sup>+</sup> denote the unknown normalizing constants for the distributions <sup>π</sup>-<sup>n</sup> and <sup>q</sup>-<sup>n</sup>.

Algorithm 1 presents a generic version of SMC [6] for approximating πn. We make the notion of approximation used in the algorithm precise in Section 5.3. Note that in the correction step, the unnormalized pointwise evaluation of f<sup>π</sup><sup>n</sup> and f<sup>q</sup><sup>n</sup> is used to calculate the weights. In the algorithm description, we also use some new terminology. First, an empirical distribution is the discrete probability measure formed by a finite set of possibly weighted samples {(a<sup>j</sup> n, w<sup>j</sup> n)}<sup>J</sup> <sup>j</sup>=1, where aj <sup>n</sup> <sup>∈</sup> <sup>A</sup><sup>n</sup> and <sup>w</sup><sup>j</sup> <sup>n</sup> <sup>∈</sup> <sup>R</sup>+. Second, when resampling an empirical distribution, we sample J times from it (with replacement), with each sample having its normalized weight as probability of being sampled. More specifically, this is known as multinomial resampling. Other resampling schemes also exist [8], and are often used in practice to reduce variance. After resampling, the set of samples forms a new empirical distribution with J unweighted (all w<sup>j</sup> <sup>n</sup> = 1) samples.

An important feature of SMC compared to other inference algorithms is that SMC produces, as a by-product of inference, unbiased estimates <sup>Z</sup>ˆ<sup>π</sup> <sup>n</sup> of the normalizing constants <sup>Z</sup><sup>π</sup> <sup>n</sup> . Stated differently, this means that Algorithm <sup>1</sup> not only approximates the <sup>π</sup>n, but also the unnormalized versions <sup>π</sup>-<sup>n</sup>. From the weights w<sup>j</sup> <sup>n</sup> in Algorithm 1, the estimates are given by

$$\hat{Z}\_{\tilde{\pi}\_n} = \prod\_{i=0}^n \frac{1}{J} \sum\_{j=1}^J w\_i^j \approx Z\_{\tilde{\pi}\_n} \tag{11}$$

for each <sup>π</sup>-<sup>n</sup>. We give the unbiasedness result of <sup>Z</sup>ˆ<sup>π</sup> <sup>n</sup> in Lemma <sup>5</sup> (item 2) below. The normalizing constant is often used to compare the accuracy of different probabilistic models, and as such, it is also known as the marginal likelihood, or model evidence. For an example application, see Ronquist et al. [30].

To conclude this section, note that many sequences of probability kernels k<sup>n</sup> can be used to approximate the same sequence of measures πn. The only requirement on the k<sup>n</sup> is that f<sup>π</sup><sup>n</sup> (an) > 0 =⇒ f<sup>q</sup><sup>n</sup> (an) > 0 must hold for all <sup>n</sup> <sup>∈</sup> <sup>N</sup><sup>0</sup> and <sup>a</sup><sup>n</sup> <sup>∈</sup> <sup>A</sup><sup>n</sup> (i.e., the proposals must "cover" the <sup>π</sup>n) [9]. We call such a sequence of kernels k<sup>n</sup> valid. Different choices of k<sup>n</sup> induce different proposals qn, and hence capture different SMC algorithms. The most common example is the BPF, which directly uses the kernels from the model as the sequence of kernels in the SMC algorithm (hence the "bootstrap"). In Section 7.1, we formalize the bootstrap kernels in the context of our calculus. However, we may want to choose other probability kernels that satisfy the covering condition, since the choice of kernels can have major implications for the rate of convergence [28].

### **5.3 Correctness**

We begin by defining the notion of approximation used in Algorithm 1.

**Definition 21 (Based on Chopin [6, p. 2387]).** Let (A, <sup>A</sup>) denote a measurable space, {{(aj,J , wj,J )}<sup>J</sup> <sup>j</sup>=1}<sup>J</sup>∈<sup>N</sup> a triangular array of random variables in <sup>A</sup><sup>×</sup> <sup>R</sup>, and <sup>π</sup> : A → <sup>R</sup><sup>∗</sup> <sup>+</sup> a probability measure. We say that {{(aj,J , wj,J )}<sup>J</sup> <sup>j</sup>=1}<sup>J</sup>∈<sup>N</sup> approximates π if the equality lim J→∞ <sup>J</sup> <sup>j</sup>=1 wj,Jϕ(aj,J ) <sup>J</sup> <sup>j</sup>=1 <sup>w</sup>j,J <sup>=</sup> <sup>E</sup>π(ϕ) holds almost surely for all measurable functions <sup>ϕ</sup> : (A, <sup>A</sup>) <sup>→</sup> (R, <sup>B</sup>) such that <sup>E</sup>π(ϕ)—the expected value of the function ϕ over the distribution π—exists.

First, note that the triangular array can also be viewed as a sequence of random empirical distributions (indexed by J). Precisely such sequences are formed by the random empirical distributions in Algorithm 1 when indexed by the increasing number of samples J. For simplicity, we often let context determine the sequence, and directly state that a random empirical distribution approximates some distribution (as in Algorithm 1).

Two classical results in SMC literature are given in the following lemma: a law of large numbers and the unbiasedness of the normalizing constant estimate. We take these results as the definition of SMC correctness used in this paper.

**Lemma 5.** Let <sup>π</sup>n, <sup>n</sup> <sup>∈</sup> <sup>N</sup>0, be a sequence of probability measures over measurable spaces (An, <sup>A</sup>n) with default <sup>σ</sup>-finite measures <sup>μ</sup><sup>A</sup><sup>n</sup> , such that the <sup>π</sup><sup>n</sup> have densities f<sup>π</sup><sup>n</sup> with respect to these default measures. Furthermore, let q<sup>0</sup> be a probability measure with density f<sup>q</sup><sup>0</sup> with respect to μ<sup>A</sup><sup>0</sup> , and k<sup>n</sup> a sequence of probability kernels inducing a sequence of proposal probability measures qn, given by (10), over (An, <sup>A</sup>n) with densities <sup>f</sup><sup>q</sup><sup>n</sup> with respect to <sup>μ</sup><sup>A</sup><sup>n</sup> . Also, assume the <sup>k</sup><sup>n</sup> are valid, i.e., that that <sup>f</sup><sup>π</sup><sup>n</sup> (an) <sup>&</sup>gt; 0 =<sup>⇒</sup> <sup>f</sup><sup>q</sup><sup>n</sup> (an) <sup>&</sup>gt; <sup>0</sup> holds for all <sup>n</sup> <sup>∈</sup> <sup>N</sup><sup>0</sup> and <sup>a</sup><sup>n</sup> <sup>∈</sup> <sup>A</sup>n. Then

1. the empirical distributions {(a<sup>j</sup> n, w<sup>j</sup> <sup>n</sup>)} J <sup>j</sup>=1 and {aˆ<sup>j</sup> <sup>n</sup>} J <sup>j</sup>=1 produced by Algorithm <sup>1</sup> approximate <sup>π</sup><sup>n</sup> for each <sup>n</sup> <sup>∈</sup> <sup>N</sup>0; and

2. <sup>E</sup>(Zˆ<sup>π</sup> <sup>n</sup> ) = <sup>Z</sup><sup>π</sup> <sup>n</sup> for each <sup>n</sup> <sup>∈</sup> <sup>N</sup>0, where the expectation is taken with respect to the weights produced when running Algorithm 1, and <sup>Z</sup>ˆ<sup>π</sup> <sup>n</sup> is given by (11).

Proof. As referenced in Naesseth et al. [26], see Del Moral [7][Theorem 7.4.3] for 1. For 2, see Naesseth et al. [26][Appendix 4.A].

Chopin [6][Theorem 1] gives another SMC convergence result in the form of a central limit. This result, however, requires further restrictions on the weights w<sup>j</sup> n in Algorithm 1. It is not clear when these restrictions are fulfilled when applying SMC on a program in our calculus. This is an interesting topic for future work.

# **6 Formal SMC for Probabilistic Programming Languages**

This section contains our main contribution: how to interpret the operational semantics of our calculus as the unnormalized sequence of measures <sup>π</sup>-<sup>n</sup> in Chopin's formalization (Section 6.1), as well as sufficient conditions for this sequence of approximating measures to converge to **t** and for the normalizing constant estimate to be correct (Section 6.2).

An important insight during this work was that it is more convenient to find an approximating sequence of measures **t**<sup>n</sup> to the trace measure **t**, compared to finding a sequence of measures **<sup>t</sup>**<sup>n</sup> directly approximating the target measure **<sup>t</sup>**. In Section 6.1, we define **t**<sup>n</sup> similarly to **t**, except that at most n evaluations of resample are allowed. This upper bound on the number of resamples is formalized through the relation \$→ from Section 3.3.

In Section 6.2, we obtain two different conditions for the convergence of the sequence **t**<sup>n</sup> to **t**: Theorem 1 states that for programs with an upper bound N on the number of resamples they evaluate, **t**<sup>N</sup> = **t**. This precondition holds in many practical settings, for instance where each resampling is connected to a datum collected before inference starts. Theorem 2 states another convergence result for programs without such an upper bound but with dominated weights. Because of these convergence results, we can often approximate **t** by approximating **t**<sup>n</sup> with Algorithm 1. When this is the case, Lemma 5 implies that Algorithm 1, either after a sufficient number of time steps or asymptotically, correctly approximates **t** and the normalizing constant Z**t**. This is the content of Theorem 3. We conclude Section 6.2 by discussing resample placements and their relation to Theorem 3, as well as practical implications of Theorem 3.

### **6.1 The Sequence of Measures Generated by a Program**

We now apply the formalization from Section 4.3 again, but with f**<sup>t</sup>**,n and r**<sup>t</sup>**,n (from Section 3.3) replacing f**<sup>t</sup>** and r**t**. Intuitively, this yields a sequence of measures **<sup>t</sup>**<sup>n</sup> indexed by <sup>n</sup>, which are similar to **<sup>t</sup>**, but only allow for evaluating at most n resamples. To illustrate this idea, consider again the program **t**seq in (9). Here, **<sup>t</sup>**seq <sup>0</sup> is a distribution over terms of the form **<sup>E</sup>**<sup>1</sup> seq[resample; x1], **<sup>t</sup>**seq <sup>1</sup> a distribution over terms of the form **<sup>E</sup>**<sup>2</sup> seq[resample; x2], and so forth. For <sup>n</sup> <sup>≥</sup> <sup>t</sup>, **<sup>t</sup>**seq <sup>n</sup> <sup>=</sup> **<sup>t</sup>**seq , because it is clear that <sup>t</sup> is an upper bound on the number of resamples evaluated in **t**seq .

While the measures **<sup>t</sup>**<sup>n</sup> are useful for giving intuition, it is easier from a technical perspective to define and work with **t**n, the sequence of measures over traces where at most n resamples are allowed. First, we need the following result, analogous to Lemma 4.

**Lemma 6.** <sup>r</sup>**t**,n : (S, <sup>S</sup>) <sup>→</sup> (T, <sup>T</sup> ) and <sup>f</sup>**t**,n : (S, <sup>S</sup>) <sup>→</sup> (R+, <sup>B</sup>+) are measurable.† This allows us to define **t**<sup>n</sup> (cf. Definition 17).

**Definition 22. t**n(S) = <sup>+</sup> <sup>S</sup> f**<sup>t</sup>**,n(s) dμS(s)

### **6.2 Correctness**

We begin with a convergence result for when the number of calls to resample in a program is upper bounded.

**Theorem 1.** If there is <sup>N</sup> <sup>∈</sup> <sup>N</sup> such that <sup>f</sup>**<sup>t</sup>**,n <sup>=</sup> <sup>f</sup>**<sup>t</sup>** whenever n>N, then **t**<sup>n</sup> = **t** for all n>N.

This follows directly since f**<sup>t</sup>**,n not only converges to f**t**, but is also equal to f**<sup>t</sup>** for all n>N. However, even if the number of calls to resample in **t** is upper bounded, there is still one concern with using **t**<sup>n</sup> as <sup>π</sup>-<sup>n</sup> in Algorithm 1: there is no guarantee that the measures **t**<sup>n</sup> can be normalized to probability measures and have unique densities (i.e., that they are finite). This is a requirement for the correctness results in Lemma 5. Unfortunately, recall from Section 4.3 that there is no known useful syntactic restriction that enforces finiteness of the target measure. This is clearly true for the measures **t**<sup>n</sup> as well, and as such, we need to make the assumption that the **t**<sup>n</sup> are finite—otherwise, it is not clear that Algorithm 1 produces the correct result, since the conditions in Lemma 5 are not fulfilled. Fortunately, this assumption is valid for most, if not all, models of practical interest. Nevertheless, investigating whether or not the restriction to probability measures in Lemma 5 can be lifted to some extent is an interesting topic for future work.

Although of limited practical interest, programs with an unbounded number of calls to resample are of interest from a semantic perspective. If we have lim<sup>n</sup>→∞**t**<sup>n</sup> = **t** pointwise, then any SMC algorithm approximating the sequence **t**<sup>n</sup> also approximates **t**, at least asymptotically in the number of steps n. First, consider the program **t**geo-res given by

> let rec geometric = resample; if samplebern(0.6) then 1 + geometric () else 1 in geometric (). (12)

Note that **t**geo-res has no upper bound on the number of calls to resample, and therefore Theorem 1 is not applicable. It is easy, however, to check that lim<sup>n</sup>→∞**t**geo-res<sup>n</sup> = **t**geo-res pointwise. So does lim<sup>n</sup>→∞**t**<sup>n</sup> = **t** pointwise hold in general? The answer is no, as we demonstrate next.

For lim<sup>n</sup>→∞**t**<sup>n</sup> = **t** to hold pointwise, it must hold that lim<sup>n</sup>→∞ f**t**,n = f**<sup>t</sup>** pointwise μS-ae. Unfortunately, this does not hold for all programs. Consider the program **t**loop defined by let rec loop \_ = resample; loop () in loop (). Here, f**t**loop = 0 since the program diverges deterministically, but f**t**loop ,n(()S)=1 for all n. Because μS({()S}) = 0, we do not have lim<sup>n</sup>→∞ f**t**loop ,n = f**t**loop pointwise μS-ae.

Even if we have lim<sup>n</sup>→∞ f**t**,n = f**<sup>t</sup>** pointwise μS-ae, we might not have lim<sup>n</sup>→∞**t**<sup>n</sup> = **t** pointwise. Consider, for instance, the program **t**unit given by

```
let s = sampleU (0, 1) in
let rec foo n =
 if s ≤ 1/n then resample; weight 2; foo (2 · n) else weight 0 in
foo 1
                                                                        (13)
```
We have <sup>f</sup>**<sup>t</sup>**unit = 0 and <sup>f</sup>**<sup>t</sup>**unit,n = 2<sup>n</sup> · **<sup>1</sup>**[0,1/2n] for n > 0. Also, lim<sup>n</sup>→∞ <sup>f</sup>**<sup>t</sup>**unit,n <sup>=</sup> <sup>f</sup>**<sup>t</sup>**unit pointwise. However, lim<sup>n</sup>→∞**t**unitn(S)=1 =0= **t**unit(S). This shows that the limit may fail to hold, even for programs that terminate almost surely, as is the case for the program **t**unit in (13). In fact, this program is positively almost surely terminating [4] since the expected number of recursive calls to foo is 1.

Guided by the previous example, we now state the dominated convergence theorem—a fundamental result in measure theory—in the context of SMC inference in our calculus.

**Theorem 2.** Assume that lim<sup>n</sup>→∞ f**<sup>t</sup>**,n = f**<sup>t</sup>** holds pointwise μS-ae. Furthermore, assume that there exists a measurable function <sup>g</sup> : (S, <sup>S</sup>) <sup>→</sup> (R+, <sup>B</sup>+) such that <sup>f</sup>**<sup>t</sup>**,n <sup>≤</sup> g μS-ae for all <sup>n</sup>, and <sup>+</sup> <sup>S</sup> g(s)dμS(s) < ∞. Then lim<sup>n</sup>→∞**t**<sup>n</sup> = **t** pointwise.

For a proof, see McDonald and Weiss [23, Theorem 4.9]. It is easy to check that for our example in (13), there is no dominating and integrable g as is required in Theorem 2. We have already seen that the conclusion of the theorem fails to hold here. As a corollary, if there exists a dominating and integrable g, the measures **t**<sup>n</sup> are always finite.

**Corollary 1.** If there exists a measurable function <sup>g</sup> : (S, <sup>S</sup>) <sup>→</sup> (R+, <sup>B</sup>+) such that <sup>f</sup>**<sup>t</sup>**,n <sup>≤</sup> g μS-ae for all <sup>n</sup>, and <sup>+</sup> <sup>S</sup> g(s)dμS(s) < ∞, then **t**<sup>n</sup> is finite for each <sup>n</sup> <sup>∈</sup> <sup>N</sup>0.

This holds because **t**n(S) = <sup>+</sup> <sup>S</sup> <sup>f</sup>**<sup>t</sup>**,n(s)dμS(s) <sup>≤</sup> <sup>+</sup> <sup>S</sup> g(s)dμS(s) < ∞. Hence, we do not need to assume the finiteness of **t**<sup>n</sup> in order for Algorithm 1 to be applicable, as was the case for the setting of Theorem 1.

In Theorem 3, we summarize and combine the above results with Lemma 5.

**Theorem 3.** Let **<sup>t</sup>** be a term, and apply Algorithm <sup>1</sup> with **t**<sup>n</sup> as <sup>π</sup>-<sup>n</sup>, and with arbitrary valid kernels kn. If the condition of Theorem 1 holds and **t**<sup>n</sup> is finite for each <sup>n</sup> <sup>∈</sup> <sup>N</sup>0, then Algorithm <sup>1</sup> approximates **t** and its normalizing constant after a finite number of steps. Alternatively, if the condition of Theorem 2 holds, then Algorithm 1 approximates **t** and its normalizing constant in the limit n → ∞.

This follows directly from Theorem 1, Theorem 2, and Lemma 5.

We conclude this section by discussing resample placements, and the practical implications of Theorem 3. First, we define a resample placement for a term **t** as the term resulting from replacing arbitrary subterms **t** of **t** with resample; **t** . Note that such a placement directly corresponds to constructing the sequence **t**n. Second, note that the measure **t** and the target measure **<sup>t</sup>** are clearly unaffected by such a placement—indeed, resample simply evaluates to (), and for **t** and **<sup>t</sup>**, there is no bound on how many resample<sup>s</sup> we can evaluate. As such, we conclude that all resample placements in **t** fulfilling one of the two conditions in Theorem 3 leads to a correct approximation of **t** when applying Algorithm 1. Furthermore, there is always, in practice, an upper bound on the number of calls to resample, since any concrete run of SMC has an (explicit or implicit) upper bound on its runtime. This is a powerful result, since it implies that when implementing SMC for PPLs, any method for selecting resampling locations in a program is correct under mild conditions (Theorem 1 or Theorem 2) that are most often, if not always, fulfilled in practice. Most importantly, this justifies the basic approach for placing resamples found in WebPPL, Anglican, and Birch, in which every call to weight is directly followed (implicitly) by a call to resample. It also justifies the approach to placing resamples described in Lund´en et al. [21]. This latter approach is essential in, e.g., Ronquist et al. [30], in order to increase inference efficiency.

Our results also show that the restriction in Anglican requiring all executions to encounter the same number of resamples, is too conservative. Clearly, this is not a requirement in either Theorem 1 or Theorem 2. For instance, the number of calls to resample varies significantly in (12).

# **7 SMC Algorithms**

In this section, we take a look at how the kernels k<sup>n</sup> in Algorithm 1 can be instantiated to yield the concrete SMC algorithm known as the bootstrap particle filter (Section 7.1), and also discuss other SMC algorithms and how they relate to Algorithm 1 (Section 7.2).

### **7.1 The Bootstrap Particle Filter**

We define for each term **t** a particular sequence of kernels k**<sup>t</sup>**,n, that gives rise to the SMC algorithm known as the bootstrap particle filter (BPF). Informally, these kernels correspond to simply continuing to evaluate the program until either arriving at a value **v** or a term of the form **E**[resample]. For the bootstrap kernel, calculating the weights w<sup>j</sup> <sup>n</sup> from Algorithm 1 is particularly simple.

Similarly to **t**n, it is more convenient to define and work with sequences of kernels over traces, rather than terms. We will define k**<sup>t</sup>**,n(s, ·) to be the subprobability measure over extended traces s ∗ s resulting from evaluating the term r**t**,n−1(s) until the next resample or value **v**, ignoring any call to weight. First, we immediately have that the set of all traces that do not have s as prefix must have measure zero. To make this formal, we will use the inverse images of the functions prepends(s ) = s ∗ s , <sup>s</sup> <sup>∈</sup> <sup>S</sup> in the definition of the kernel.

**Lemma 7.** The functions prepend<sup>s</sup> : (S, <sup>S</sup>) <sup>→</sup> (S, <sup>S</sup>) are measurable.†

The next ingredient for defining the kernels k**t**,n is a function p**t**,n that indicates what traces are possible when executing **t** until the n + 1th resample or value.

$$\text{Definition 23. } p\_{\mathbf{t},n}(s) = \begin{cases} 1 & \text{if } \mathbf{t}, \cdot, s, n \stackrel{\scriptstyle \approx}{\longrightarrow} \mathbf{v}, \cdot, ( )\_{\mathbb{S}}, \cdot \\ 1 & \text{if } \mathbf{t}, \cdot, s, n \stackrel{\scriptstyle \approx}{\longrightarrow} \mathbf{E}[\mathbf{r} \mathbf{esamp1e}], \cdot, ( )\_{\mathbb{S}}, 0, \\ 0 & \text{otherwise} \end{cases}$$

Note the similarities to Definition 9. In particular, f**<sup>t</sup>**,n(s) > 0 implies p**<sup>t</sup>**,n(s) = 1. However, note that f**<sup>t</sup>**,n(s) = 0 does not imply p**<sup>t</sup>**,n(s) = 0, since p**<sup>t</sup>**,n ignores weights. As an example, f(weight 0),n(()S) = 0, while p(weight 0),n(()S) = 1.

**Lemma 8.** <sup>p</sup>**<sup>t</sup>**,n : (S, <sup>S</sup>) <sup>→</sup> (R+, <sup>B</sup>+) is measurable.

The proof is analogous to that of Lemma 6. We can now formally define the kernels k**<sup>t</sup>**,n.

#### **Definition 24.** k**<sup>t</sup>**,n(s, S) = + prepend−<sup>1</sup> <sup>s</sup> (S) p<sup>r</sup>**t**,n−1(s),1(s ) dμS(s )

By the definition of p**<sup>t</sup>**,n, the k**<sup>t</sup>**,n are sub-probability kernels rather than probability kernels. Intuitively, the reason for this is that during evaluation, terms can get stuck, deterministically diverge, or even stochastically diverge. Such traces are assigned 0 weight by p**<sup>t</sup>**,n.

**Lemma 9.** The functions <sup>k</sup>**<sup>t</sup>**,n : <sup>S</sup> ×S→ <sup>R</sup><sup>+</sup> are sub-probability kernels.†<sup>7</sup>

We get a natural starting measure q<sup>0</sup> from the sub-probability distribution resulting from running the initial program **t** until reaching a value or a call to resample, ignoring weights.

#### **Definition 25. t**0(S) = <sup>+</sup> <sup>S</sup> p**<sup>t</sup>**,0(s)dμS(s).

Now we have all the ingredients for the general SMC algorithm described in Section 5.2: a sequence of target measures **t**<sup>n</sup> <sup>=</sup> <sup>π</sup>-<sup>n</sup> (Definition 22), a starting measure **t**<sup>0</sup> ∝ q<sup>0</sup> (Definition 25), and a sequence of kernels k**<sup>t</sup>**,n ∝ k<sup>n</sup> (Definition 24). These then induce a sequence of proposal measures **t**<sup>n</sup> <sup>=</sup> <sup>q</sup>-<sup>n</sup> as in Equation (10), which we instantiate in the following definition.

#### **Definition 26. t**n(S) = <sup>+</sup> <sup>S</sup> k**<sup>t</sup>**,n(s, S)f**<sup>t</sup>**,n−<sup>1</sup>(s)dμS(s), n> 0

Intuitively, the measures **t**<sup>n</sup> are obtained by evaluating the terms in the support of the measure **t**<sup>n</sup>−<sup>1</sup> until reaching the next resample or value. For an efficient implementation, we need to factorize this definition into the history and the current step, which amounts to splitting the traces. Each feasible trace can be split in such a way.

<sup>7</sup> We only give a partial proof of this lemma.

**Algorithm 2** A concrete instantiation of Algorithm <sup>1</sup> with <sup>π</sup>-<sup>n</sup> <sup>=</sup> **t**n, <sup>k</sup><sup>n</sup> <sup>∝</sup> <sup>k</sup>**t**,n, <sup>q</sup><sup>0</sup> ∝ **t**0, and as a consequence <sup>q</sup>-<sup>n</sup> <sup>=</sup> **t**<sup>n</sup> (for n > 0). In each step, we let 1 ≤ j ≤ J, where J is the number of samples.


As a consequence of Lemma 13, this is trivial. Simply set w<sup>j</sup> <sup>n</sup> to the weight accumulated while running **<sup>t</sup>** in step (1), or <sup>r</sup>**t**,n−<sup>1</sup>(ˆs<sup>j</sup> <sup>n</sup>−<sup>1</sup>) in step (5). The empirical distribution given by {(s<sup>j</sup> n, w<sup>j</sup> n)}<sup>J</sup> <sup>j</sup>=1 approximates **t**n/Z **<sup>t</sup>** n .


**Lemma 10.** Let n > 0. If f**<sup>t</sup>**,n(s) > 0, then f**<sup>t</sup>**,n(s) = f**<sup>t</sup>**,n−<sup>1</sup>(s)f<sup>r</sup>**t**,n−1(s),1(s) for exactly one decomposition s∗s = s. If f**<sup>t</sup>**,n(s)=0, then f**<sup>t</sup>**,n−1(s)f<sup>r</sup>**t**,n−1(s),1(s) = 0 for all decompositions s ∗ s = s. As a consequence, if f**<sup>t</sup>**,n(s) > 0, then p<sup>r</sup>**t**,n−1(s),1(s)=1. †

This gives a more efficiently computable definition of the density.

**Lemma 11.** For <sup>n</sup> <sup>∈</sup> <sup>N</sup>, **t**n(S) = <sup>+</sup> <sup>S</sup> f**<sup>t</sup>**,n−<sup>1</sup>(s)p<sup>r</sup>**t**,n−1(s),1(s)dμS(s), where s ∗ s = s is the unique decomposition from Lemma 10. †8

Since the kernels k**<sup>t</sup>**,n are sub-probability kernels, the measures **t**<sup>n</sup> are finite given that the **t**<sup>n</sup> are finite.

**Lemma 12. t**<sup>0</sup> is a sub-probability measure. Also, if **t**<sup>n</sup>−<sup>1</sup> is finite, then **t**<sup>n</sup> is finite.†

As discussed in Section 6.2, the **t**<sup>n</sup> are finite, either by assumption (Theorem 1) or as a consequence of the dominating function of Theorem 2. From this

<sup>8</sup> We only give a proof sketch for this lemma.

and Lemma 12, the **t**<sup>n</sup> are also finite. Furthermore, checking that **t**<sup>n</sup> are valid, i.e. that the density f**t**<sup>n</sup> of each **t**<sup>n</sup> covers the density f**t**<sup>n</sup> of **t**<sup>n</sup> is trivial. As such, by Lemma 5, we can now correctly approximate **t**<sup>n</sup> using Algorithm 1. The details are given in Algorithm 2, which closely resembles the standard SMC algorithm in WebPPL. For ease of notation, we assume it possible to draw samples from **t**<sup>0</sup> and k**t**,n(s, ·), even though these are sub-probability measures. This essentially corresponds to assuming evaluation never gets stuck or diverges. Making sure this is the case is not within the scope of this paper. The weights in Algorithm 2 at time step n can easily be calculated according to the following lemma.

$$\text{Lemma 13.}\ w\_n(s) = \frac{f\_{\langle \mathfrak{t} \rangle \rangle\_n}(s)}{f\_{\langle \mathfrak{t} \rangle\_n}(s)} = \begin{cases} f\_{r\_{\mathfrak{t}, n-1}(\underline{s}), 1}(\overline{s}) & \text{if } n > 0\\ f\_{\mathfrak{t}, 0}(s) & \text{if } n = 0 \end{cases} \\ when \ f\_{\langle \mathfrak{t} \rangle\_n}(s) > 0.$$

Here, s ∗ s = s is the unique decomposition from Lemma 10. †

### **7.2 Other SMC Algorithms**

In this section, we discuss SMC algorithms other than the BPF.

First, we have the resample-move algorithm by Gilks and Berzuini [11], which is also implemented in WebPPL [13], and treated by Chopin [6] and Scibior et ´ al. [33]. In this algorithm, the SMC kernel is composed with a suitable MCMC kernel, such that one or more MCMC steps are taken for each sample after each resampling. This helps with the so-called degeneracy problem in SMC, which refers to the tendency of SMC samples to share a common ancestry as a result of resampling. We can directly achieve this algorithm in our context by simply choosing appropriate transition kernels in Algorithm 1. Let kMCMC,n be MCMC transition kernels with <sup>π</sup>-n−<sup>1</sup> <sup>=</sup> **t**<sup>n</sup>−<sup>1</sup> as invariant distributions. Using the bootstrap kernels as the main kernels, we let k<sup>n</sup> = k**<sup>t</sup>**,n ◦ kMCMC,n where ◦ denotes kernel composition. The sequence k<sup>n</sup> is valid because of the validity of the main SMC kernels and the invariance of the MCMC kernels.

While Algorithm 1 captures different SMC algorithms by allowing the use of different kernels, some algorithms require changes to Algorithm 1 itself. The first such variation of Algorithm 1 is the alive particle filter, recently discussed by Kudlicka et al. [19], which reduces the tendency to degeneracy by not including sample traces with zero weight in resampling. This is done by repeating the selection and mutation steps (for each sample individually) until a trace with non-zero weight is proposed; the corresponding modifications to Algorithm 1 are straightforward. The unbiasedness result of Kudlicka et al. [19] can easily be extended to our PPL context, with another minor modification to Algorithm 1.

Another variation of Algorithm 1 is the auxiliary particle filter [28]. Informally, this algorithm allows the selection and mutation steps of Algorithm 1 to be guided by future information regarding the weights wn. For many models, this is possible since the weighting functions w<sup>n</sup> from Algorithm 1 are often parametric in an explicitly available sequence of observation data points, which can also be used to derive better kernels kn. Clearly, such optimizations are model-specific, and can not directly be applied in expressive PPL calculi such as ours. However, the general idea of using look-ahead in general-purpose PPLs to guide selection and mutation is interesting, and should be explored.

# **8 Related Work**

The only major previous work related to formal SMC correctness in PPLs is Scibior et al. [ ´ 33] (see Section 1). They validate both the BPF and the resamplemove SMC algorithms in a denotational setting. In a companion paper, Scibior ´ et al. [32] also give a Haskell implementation of these inference techniques.

Although formal correctness proofs of SMC in PPLs are sparse, there are many languages that implement SMC algorithms. Goodman and Stuhlm¨uller [14] describe SMC for the probabilistic programming language WebPPL. They implement a basic BPF very similar to Algorithm 2, but do not show correctness with respect to any language semantics. Also, related to WebPPL, Stuhlm¨uller et al. [36] discuss a coarse-to-fine SMC inference technique for probabilistic programs with independent sample statements.

Wood et al. [43] describe PMCMC, an MCMC inference technique that uses SMC internally, for the probabilistic programming language Anglican [37]. Similarly to WebPPL, Anglican also includes a basic BPF similar to Algorithm 2, with the exception that every execution needs to encounter the same number of calls to resample. They use various types of empirical tests to validate correctness, in contrast to the formal proof found in this paper. Related to Anglican, a brief discussion on resample placement requirements can be found in van de Meent et al. [41].

Birch [25] is an imperative object-oriented PPL, with a particular focus on SMC. It supports a number of SMC algorithms, including the BPF [16] and the auxiliary particle filter [28]. Furthermore, they support dynamic analytical optimizations, for instance using locally-optimal proposals and Rao– Blackwellization [24]. As with WebPPL and Anglican, the focus is on performance and efficiency, and not on formal correctness.

There are quite a few papers studying the correctness of MCMC algorithms for PPLs. Using the same underlying framework as for their SMC correctness proof, Scibior et al. [ ´ 33] also validates a trace MCMC algorithm. Another proof of correctness for trace MCMC is given in Borgstr¨om et al. [3], which instead uses an untyped lambda calculus and an operational semantics. Much of the formalization in this paper is based on constructions used as part of their paper. For instance, the functions f**<sup>t</sup>** and r**<sup>t</sup>** are defined similarly, as well as the measure space (S, <sup>S</sup>, μS) and the measurable space (T, <sup>T</sup> ). Our measurability proofs of f**t**, r**t**, f**<sup>t</sup>**,n, and r**<sup>t</sup>**,n largely follow the same strategies as found in their paper. Similarly to us, they also relate their proof of correctness to classical results from the MCMC literature. A difference is that we use inverse transform sampling, whereas they use probability density functions. As a result of this, our traces consist of numbers on [0, 1], while their traces consist of numbers on R. Also, inverse transform sampling naturally allows for built-in discrete distributions. In contrast, discrete distributions must be encoded in the language itself when

using probability densities. Another difference is that they restrict the arguments to weight to [0, 1], in order to ensure the finiteness of the target measure.

Other work related to ours include Jacobs [17], V´ak´ar et al. [39], and Staton et al. [35]. Jacobs [17] discusses problems with models in which observe (related to weight) statements occur conditionally. While our results show that SMC inference for such models is correct, the models themselves may not be useful. V´ak´ar et al. [39] develops a powerful domain theory for term recursion in PPLs, but does not cover SMC inference in particular. Staton et al. [35] develops both operational and denotational semantics for a PPL calculus with higher-order functions, but without recursion. They also briefly mention SMC as a program transformation.

Classical work on SMC includes Chopin [6], which we use as a basis for our formalization. In particular, Chopin [6] provides a general formulation of SMC, placing few requirements on the underlying model. The book by Del Moral [7] contains a vast number of classical SMC results, including the law of large numbers and unbiasedness result from Lemma 5. A more accessible summary of the important SMC convergence results from Del Moral [7] can be found in Naesseth et al. [26].

# **9 Conclusions**

In conclusion, we have formalized SMC inference for an expressive functional PPL calculus, based on the formalization by Chopin [6]. We showed that in this context, SMC is correct in that it approximates the target measures encoded by programs in the calculus under mild conditions. Furthermore, we illustrated a particular instance of SMC for our calculus, the bootstrap particle filter, and discussed other variations of SMC and their relation to our calculus.

As indicated in Section 2, the approach used for selecting resampling locations can have a large impact on SMC accuracy and performance. This leads us to the following general question: can we select optimal resampling locations in a given program, according to some formally defined measure of optimality? We leave this important research direction for future work.

### **Acknowledgments**

We thank our colleagues Lawrence Murray and Fredrik Ronquist for fruitful discussions and ideas. We also thank Sam Staton and the anonymous reviewers at ESOP for their detailed and helpful comments.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Densities of Almost Surely Terminating Probabilistic Programs are Differentiable Almost Everywhere**

Carol Mak , C.-H. Luke Ong , Hugo Paquet , and Dominik Wagner( )

Department of Computer Science, University of Oxford, Oxford, UK {pui.mak,luke.ong,hugo.paquet,dominik.wagner}@cs.ox.ac.uk

**Abstract.** We study the differential properties of higher-order statistical probabilistic programs with recursion and conditioning. Our starting point is an open problem posed by Hongseok Yang: what class of statistical probabilistic programs have densities that are differentiable almost everywhere? To formalise the problem, we consider Statistical PCF (SPCF), an extension of call-by-value PCF with real numbers, and constructs for sampling and conditioning. We give SPCF a sampling-style operational semantics `a la Borgstr¨om et al., and study the associated weight (commonly referred to as the density) function and value function on the set of possible execution traces.

Our main result is that almost surely terminating SPCF programs, generated from a set of primitive functions (e.g. the set of analytic functions) satisfying mild closure properties, have weight and value functions that are almost everywhere differentiable. We use a stochastic form of symbolic execution to reason about almost everywhere differentiability. A by-product of this work is that almost surely terminating deterministic (S)PCF programs with real parameters denote functions that are almost everywhere differentiable.

Our result is of practical interest, as almost everywhere differentiability of the density function is required to hold for the correctness of major gradient-based inference algorithms.

# **1 Introduction**

Probabilistic programming refers to a set of tools and techniques for the systematic use of programming languages in Bayesian statistical modelling. Users of probabilistic programming — those wishing to make inferences or predictions — **(i)** encode their domain knowledge in program form; **(ii)** condition certain program variables based on observed data; and **(iii)** make a query. The resulting code is then passed to an inference engine which performs the necessary computation to answer the query, usually following a generic approximate Bayesian inference algorithm. (In some recent systems [5,14], users may also write their own inference code.) The Programming Language community has contributed to the field by developing formal methods for probabilistic programming languages (PPLs), seen as usual languages enriched with primitives for **(i)** sampling and

c The Author(s) 2021

N. Yoshida (Ed.): ESOP 2021, LNCS 12648, pp. 432–461, 2021. https://doi.org/10.1007/978-3-030-72019-3 16

**(ii)** conditioning. (The query **(iii)** can usually be encoded as the return value of the program.)

It is crucial to have access to reasoning principles in this context. The combination of these new primitives with the traditional constructs of programming languages leads to a variety of new computational phenomena, and a major concern is the correctness of inference: given a query, will the algorithm converge, in some appropriate sense, to a correct answer? In a universal PPL (i.e. one whose underlying language is Turing-complete), this is not obvious: the inference engine must account for a wide class of programs, going beyond the more well-behaved models found in many of the current statistical applications. Thus the design of inference algorithms, and the associated correctness proofs, are quite delicate. It is well-known, for instance, that in its original version the popular lightweight Metropolis-Hastings algorithm [53] contained a bug affecting the result of inference [20,25].

Fortunately, research in this area benefits from decades of work on the semantics of programs with random features, starting with pioneering work by Kozen [26] and Saheb-Djahromi [44]. Both operational and denotational models have recently been applied to the validation of inference algorithms: see e.g. [20,8] for the former and [45,10] for the latter. There are other approaches, e.g. using refined type systems [33].

Inference algorithms in probabilistic programming are often based on the concept of program trace, because the operational behaviour of a program is parametrised by the sequence of random numbers it draws along the way. Accordingly a probabilistic program has an associated value function which maps traces to output values. But the inference procedure relies on another function on traces, commonly called the density<sup>1</sup> of the program, which records a cumulative likelihood for the samples in a given trace. Approximating a normalised version of the density is the main challenge that inference algorithms aim to tackle. We will formalise these notions: in Sec. 3 we demonstrate how the value function and density of a program are defined in terms of its operational semantics.

**Contributions.** The main result of this paper is that both the density and value function are differentiable almost everywhere (that is, everywhere but on a set of measure zero), provided the program is almost surely terminating in a suitable sense. Our result holds for a universal language with recursion and higher-order functions. We emphasise that it follows immediately that purely deterministic programs with real parameters denote functions that are almost everywhere differentiable. This class of programs is important, because they can express machine learning models which rely on gradient descent [30].

This result is of practical interest, because many modern inference algorithms are "gradient-based": they exploit the derivative of the density function in order to optimise the approximation process. This includes the well-known methods of Hamiltonian Monte-Carlo [15,37] and stochastic variational inference [18,40,6,27]. But these techniques can only be applied when the derivative

<sup>1</sup> For some readers this terminology may be ambiguous; see Remark 1 for clarification.

exists "often enough", and thus, in the context of probabilistic programming, almost everywhere differentiability is often cited as a requirement for correctness [55,31]. The question of which probabilistic programs satisfy this property was selected by Hongseok Yang in his FSCD 2019 invited lecture [54] as one of three open problems in the field of semantics for probabilistic programs.

Points of non-differentiability exist largely because of branching, which typically arises in a program when the control flow reaches a conditional statement. Hence our work is a study of the connections between the traces of a probabilistic program and its branching structure. To achieve this we introduce stochastic symbolic execution, a form of operational semantics for probabilistic programs, designed to identify sets of traces corresponding to the same control-flow branch. Roughly, a reduction sequence in this semantics corresponds to a control flow branch, and the rules additionally provide for every branch a symbolic expression of the trace density, parametrised by the outcome of the random draws that the branch contains. We obtain our main result in conjunction with a careful analysis of the branching structure of almost surely terminating programs.

**Outline.** We devote Sec. 2 to a more detailed introduction to the problem of trace-based inference in probabilistic programming, and the issue of differentiability in this context. In Sec. 3, we present a trace-based operational semantics to Statistical PCF, a prototypical higher-order functional language previously studied in the literature. This is followed by a discussion of differentiability and almost sure termination of programs (Sec. 4). In Sec. 5 we define the "symbolic" operational semantics required for the proof of our main result, which we present in Sec. 6. We discuss related work and further directions in Sec. 7.

For the extended version of the paper refer to [34].

# **2 Probabilistic Programming and Trace-Based Inference**

In this section we give a short introduction to probabilistic programs and the densities they denote, and we motivate the need for gradient-based inference methods. Our account relies on classical notions from measure theory, so we start with a short recap.

### **2.1 Measures and Densities**

A *measurable space* is a pair (X, ΣX) consisting of a set together with a σ*algebra* of subsets, i.e. Σ<sup>X</sup> ⊆ P(X) contains ∅ and is closed under complements and countable unions and intersections. Elements of Σ<sup>X</sup> are called *measurable sets*. A *measure* on (X, ΣX) is a function μ : Σ<sup>X</sup> → [0, ∞] satisfying μ(∅) = 0, and μ( - <sup>i</sup>∈<sup>I</sup> <sup>U</sup>i) = <sup>i</sup>∈<sup>I</sup> <sup>μ</sup>(Ui) for every countable family {Ui}<sup>i</sup>∈<sup>I</sup> of pairwise disjoint measurable subsets. A (possibly partial) function X#Y is *measurable* if for every <sup>U</sup> <sup>∈</sup> <sup>Σ</sup><sup>Y</sup> we have <sup>f</sup> <sup>−</sup><sup>1</sup>(U) <sup>∈</sup> <sup>Σ</sup>X.

The space **R** of real numbers is an important example. The (Borel) σ-algebra Σ**<sup>R</sup>** is the smallest one containing all intervals [a, b), and the *Lebesgue measure* Leb is the unique measure on (**R**, Σ**R**) satisfying Leb([a, b)) = b − a. For measurable spaces (X, ΣX) and (Y,Σ<sup>Y</sup> ), the *product* σ*-algebra* Σ<sup>X</sup>×<sup>Y</sup> is the smallest one containing all U × V , where U ∈ Σ<sup>X</sup> and V ∈ Σ<sup>Y</sup> . So in particular we get for each <sup>n</sup> <sup>∈</sup> **<sup>N</sup>** a space (**R**<sup>n</sup>, Σ**R**<sup>n</sup> ), and additionally there is a unique measure Leb<sup>n</sup> on **R**<sup>n</sup> satisfying Lebn( / <sup>i</sup> Ui) = / <sup>i</sup> Leb(Ui).

When a function f : X → **R** is measurable and non-negative and μ is a measure on <sup>X</sup>, for each <sup>U</sup> <sup>∈</sup> <sup>Σ</sup><sup>X</sup> we can define the *integral* <sup>+</sup> <sup>U</sup> (dμ)f ∈ [0, ∞]. Common families of probability distributions on the reals (Uniform, Normal, etc.) are examples of measures on (**R**, Σ**R**). Most often these are defined in terms of probability density functions with respect to the Lebesgue measure, meaning that for each μ<sup>D</sup> there is a measurable function pdf<sup>D</sup> : **R** → **R**≥<sup>0</sup> which determines it: μD(U) = + <sup>U</sup> (d Leb) pdfD. As we will see, density functions such as pdf<sup>D</sup> have a central place in Bayesian inference.

Formally, if μ is a measure on a measurable space X, a *density* for μ with respect to another measure ν on X (most often ν is the Lebesgue measure) is a measurable function <sup>f</sup> : <sup>X</sup> <sup>→</sup> **<sup>R</sup>** such that <sup>μ</sup>(U) = <sup>+</sup> <sup>U</sup> (dν)f for every U ∈ ΣX. In the context of the present work, an inference algorithm can be understood as a method for approximating a distribution of which we only know the density up to a normalising constant. In other words, if the algorithm is fed a (measurable) function g : X → **R**, it should produce samples approximating the probability measure U → <sup>U</sup> (dν)<sup>g</sup> <sup>X</sup>(dν)<sup>g</sup> on <sup>X</sup>.

We will make use of some basic notions from topology: given a topological space <sup>X</sup> and an set <sup>A</sup> <sup>⊆</sup> <sup>X</sup>, the *interior* of <sup>A</sup> is the largest open set <sup>A</sup>˚contained in A. Dually the *closure* of A is the smallest closed set A containing A, and the *boundary* of <sup>A</sup> is defined as ∂A := <sup>A</sup> \ <sup>A</sup>˚. Note that for all <sup>U</sup> <sup>⊆</sup> **<sup>R</sup>**<sup>n</sup>, all of <sup>U</sup> ˚, U and ∂U are measurable (in Σ**R**<sup>n</sup> ).

### **2.2 Probabilistic Programming: a (Running) Example**

Our running example is based on a random walk in **R**≥<sup>0</sup>.

The story is as follows: a pedestrian has gone on a walk on a certain semiinfinite street (i.e. extending infinitely on one side), where she may periodically change directions. Upon reaching the end of the street she has forgotten her starting point, only remembering that she started no more than 3km away. Thanks to an odometer, she knows the total distance she has walked is 1.1km, although there is a small margin of error. Her starting point can be inferred using probabilistic programming, via the program in Fig. 1a.

The function walk in Fig. 1a is a recursive simulation of the random walk: note that in this model a new direction is sampled after at most 1km. Once the pedestrian has travelled past 0 the function returns the total distance travelled. The rest of the program first specifies a prior distribution for the starting point, representing the pedestrian's belief — uniform distribution on [0, 3] — before observing the distance measured by the odometer. After drawing a value for start the program simulates a random walk, and the execution is weighted (via score) according to how close distance is to the observed value of 1.1. The return

(a) Running example in pseudo-code. (b) Resulting histogram.

value is our query: it indicates that we are interested in the posterior distribution on the starting point.

The histogram in Fig. 1b is obtained by sampling repeatedly from the posterior of a Python model of our running example. It shows the mode of the pedestrian's starting point to be around the 0.8km mark.

To approximate the posterior, inference engines for probabilistic programs often proceed indirectly and operate on the space of program traces, rather than on the space of possible return values. By trace, we mean the sequence of samples drawn in the course of a particular run, one for each random primitive encountered. Because each random primitive (qua probability distribution) in the language comes with a density, given a particular trace we can compute a coefficient as the appropriate product. We can then multiply this coefficient by all scores encountered in the execution, and this yields a (weight) function, mapping traces to the non-negative reals, over which the chosen inference algorithm may operate. This indirect approach is more practical, and enough to answer the query, since every trace unambiguously induces a return value.

Remark 1. In much of the probabilistic programming literature (e.g. [31,55,54], including this paper), the above-mentioned weight function on traces is called the density of the probabilistic program. This may be confusing: as we have seen, a probabilistic program induces a posterior probability distribution on return values, and it is natural to ask whether this distribution admits a probability density function (Radon-Nikodym derivative) w.r.t. some base measure. This problem is of current interest [2,3,21] but unrelated to the present work.

### **2.3 Gradient-Based Approximate Inference**

Some of the most influential and practically important inference algorithms make use of the gradient of the density functions they operate on, when these are differentiable. Generally the use of gradient-based techniques allow for much greater efficiency in inference.

A popular example is the Markov Chain Monte Carlo algorithm known as Hamiltonian Monte Carlo (HMC) [15,37]. Given a density function g : X → **R**, HMC samples are obtained as the states of a Markov chain by (approximately) simulating Hamilton's equations via an integrator that uses the gradient ∇<sup>x</sup> g(x). Another important example is (stochastic) variational inference [18,40,6,27], which transforms the posterior inference problem to an optimisation problem. This method takes two inputs: the posterior density function of interest g : X → **R**, and a function h : Θ × X → **R**; typically, the latter function is a member of an expressive and mathematically well-behaved family of densities that are parameterised in Θ. The idea is to use stochastic gradient descent to find the parameter θ ∈ Θ that minimises the "distance" (typically the Kullback–Leibler divergence) between h(θ, −) and g, relying on a suitable estimate of the gradient of the objective function. When g is the density of a probabilistic program (the model), h can be specified as the density of a second program (the guide) whose traces have additional θ-parameters. The gradient of the objective function is then estimated in one approach (score function [41]) by computing the gradient ∇<sup>θ</sup> h(θ, x), and in another (reparameterised gradient [24,42,49]) by computing the gradient ∇<sup>x</sup> g(x).

In probabilistic programming, the above inference methods must be adapted to deal with the fact that in a universal PPL, the set of random primitives encountered can vary between executions, and traces can have arbitrary and unbounded dimension; moreover, the density function of a probabilistic program is generally not (everywhere) differentiable. Crucially these adapted algorithms are only valid when the input densities are almost everywhere differentiable [55,38,32]; this is the subject of this paper.

Our main result (Thm. 3) states that the weight function and value function of almost surely terminating SPCF programs are almost everywhere differentiable. This applies to our running example: the program in Fig. 1a (expressible in SPCF using primitive functions that satisfy Assumption 1 – see Ex. 1) is almost surely terminating.

# **3 Sampling Semantics for Statistical PCF**

In this section, we present a simply-typed statistical probabilistic programming language with recursion and its operational semantics.

$$\begin{array}{lcl}\sigma,\tau ::= \mathsf{R} \mid \sigma \Rightarrow \tau\\M,N,L ::= y \mid \underline{r} \mid \underline{f}(M\_{1},\ldots,M\_{\ell}) \mid \lambda y.M \mid M\,N \mid \mathsf{Y}M \mid \mathsf{if}\{L \leq 0,M,N\}\end{array}$$

$$\begin{array}{lcl}\mid \mathsf{sample} & \mathsf{score}(M) \\\hline \Gamma \vdash \mathsf{sample}:\mathsf{R} \end{array}$$

$$\begin{array}{lcl}\Gamma \vdash \mathsf{sample}:\mathsf{R} & \begin{array}{lcl}\Gamma \vdash M:\mathsf{R} \\\hline \Gamma \vdash \mathsf{score}(M):\mathsf{R} \end{array} & \begin{array}{lcl}\Gamma \vdash M:(\sigma \Rightarrow \tau) \Rightarrow (\sigma \Rightarrow \tau) \\\hline \Gamma \vdash \mathsf{Y}M:\sigma \Rightarrow \tau \end{array}\end{array}$$

Fig. 2: Syntax of SPCF, where <sup>r</sup> <sup>∈</sup> **<sup>R</sup>**, x, y are variables, and <sup>f</sup> : **<sup>R</sup>**<sup>n</sup> # **<sup>R</sup>** ranges over a set F of partial, measurable *primitive functions* (see Sec. 4.2).

### **3.1 Statistical PCF**

*Statistical PCF* (SPCF) is higher-order probabilistic programming with recursion in purified form. The terms and part of the (standard) typing system of SPCF are presented in Fig. 2 <sup>2</sup>. In the rest of the paper we write *x* to represent a sequence of variables x1,...,xn, Λ for the set of SPCF terms, and Λ<sup>0</sup> for the set of closed SPCF terms. In the interest of readability, we sometimes use pseudo code (e.g. Fig. 1a) in the style of Core ML to express SPCF terms.

SPCF is a statistical probabilistic version of call-by-value PCF [46,47] with reals as the ground type. The probabilistic constructs of SPCF are relatively standard (see for example [48]): the sampling construct sample draws from U(0, 1), the standard uniform distribution with end points 0 and 1; the scoring construct score(M) enables conditioning on observed data by multiplying the weight of the current execution with the (non-negative) real number denoted by M. Sampling from other real-valued distributions can be obtained from U(0, 1) by applying the inverse of the distribution's cumulative distribution function.

Our SPCF is an (inconsequential) variant of CBV SPCF [51] and a (CBV) extension of PPCF [16] with scoring; it may be viewed as a simply-typed version of the untyped probabilistic languages of [8,13,52].

Example 1 (Running Example Ped). We express in SPCF the example in Fig. 1a.

$$\begin{array}{l} \mathsf{Ped} \equiv \begin{pmatrix} \mathsf{let} \ x = \mathsf{sample} \cdot \underline{\mathsf{a}} \ \mathsf{in} \\ \mathsf{let} \ d = \mathsf{walk} \ x \ \mathsf{in} \\ \mathsf{let} \ w = \mathsf{score}(\mathsf{pdf}\_{\mathcal{N}(1,1,0,1)}(d)) \ \mathsf{in} \ x \end{pmatrix} \qquad \text{where} \\ \mathsf{walk} \equiv \mathsf{Y} \left( \begin{array}{l} \lambda f x. \ \text{if } x \leq \underline{0} \ \mathsf{then} \ \underline{0} \\ \mathsf{else} \ \left( \begin{array}{l} \mathsf{let} \ s = \mathsf{sample} \ in \\ \mathsf{if} \left( \left( \mathsf{sample} \leq \underline{0,5} \right), \left( s + f(x+s) \right), \left( s + f(x-s) \right) \right) \end{array} \right) \end{array} \right) \end{array}$$

The let construct, let x = N in M, is syntactic sugar for the term (λx.M) N; and pdfN(1.1,0.1), the density function of the normal distribution with mean 1.<sup>1</sup> and variance 0.1, is a primitive function. To enhance readability we use infix notation and omit the underline for standard functions such as addition.

<sup>2</sup> In Fig. 2 and in other figures, we highlight the elements that are new or otherwise noteworthy.

### **3.2 Operational Semantics**

The execution of a probabilistic program generates a trace: a sequence containing the values sampled during a run. Our operational semantics captures this dynamic perspective. This is closely related to the treatment in [8] which, following [26], views a probabilistic program as a deterministic program parametrized by the sequence of random draws made during the evaluation.

**Traces.** Recall that in our language, sample produces a random value in the open unit interval; accordingly a *trace* is a finite sequence of elements of (0, 1). We define a *measure space* **S** *of traces* to be the set - <sup>n</sup>∈**<sup>N</sup>**(0, 1)<sup>n</sup>, equipped with the standard disjoint union σ-algebra, and the sum of the respective (higherdimensional) Lebesgue measures. Formally, writing **S**<sup>n</sup> := (0, 1)<sup>n</sup>, we define:

$$\mathbb{S} := \left( \bigcup\_{n \in \mathbb{N}} \mathbb{S}\_n, \left\{ \bigcup\_{n \in \mathbb{N}} U\_n \mid U\_n \in \Sigma\_{\mathbb{S}\_n} \right\}, \mu\_{\mathbb{S}} \right) \text{ and } \mu\_{\mathbb{S}} \left( \bigcup\_{n \in \mathbb{N}} U\_n \right) := \sum\_{n \in \mathbb{N}} \text{Leb}\_n(U\_n).$$

Henceforth we write traces as lists, such as [0.5, 0.999, 0.12]; the empty trace as []; and the concatenation of traces *s*, *s* ∈ **S** as *s* ++ *s* .

More generally, to account for open terms, we define, for each m ∈ **N**, the measure space

$$\mathbb{R}^m \times \mathbb{S} := \left( \bigcup\_{n \in \mathbb{N}} \mathbb{R}^m \times \mathbb{S}\_n, \left\{ \bigcup\_{n \in \mathbb{N}} V\_n \mid V\_n \in \Sigma\_{\mathbb{R}^m \times \mathbb{S}\_n} \right\}, \mu\_{\mathbb{R}^m \times \mathbb{S}} \right),$$

where μ**R**m×**<sup>S</sup>** 4 - <sup>n</sup>∈**<sup>N</sup>** <sup>V</sup><sup>n</sup> 5 := <sup>n</sup>∈**<sup>N</sup>** Leb<sup>m</sup>+<sup>n</sup>(Vn). To avoid clutter, we will elide the subscript from μ**R**m×**<sup>S</sup>** whenever it is clear from the context.

**Small-Step Reduction.** Next, we define the *values* (typically denoted V ), *redexes* (typically R) and *evaluation contexts* (typically E):

$$\begin{aligned} V &::= \underline{r} \mid \lambda y. M\\ R &::= (\lambda y. M) \, V \mid \underline{f(\underline{r}\_1, \dots, \underline{r}\_\ell)} \mid \mathsf{Y}(\lambda y. M) \mid \mathsf{if} \, \underline{r} \le 0, M, N \rangle \mid \mathsf{sample} \mid \mathsf{score}(\underline{x})\\ E &::= \left\lfloor \right\rfloor \, E \, M \mid (\lambda y. M) \, E \mid \underline{f(\underline{r\_1}, \dots, \underline{r\_{i-1}}, E, M\_{i+1}, \dots, M\_\ell)} \mid \mathsf{Y}E\\ &\mid \text{if} \, \left(E \le 0, M, N\right) \mid \mathsf{score}(E) \end{aligned}$$

We write Λ<sup>v</sup> for the set of SPCF values, and Λ<sup>0</sup> <sup>v</sup> for the set of closed SPCF values.

It is easy to see that every closed SPCF term M is either a value, or there exists a unique pair of context E and redex R such that M ≡ E[R].

We now present the operational semantics of SPCF as a rewrite system of *configurations*, which are triples of the form M, w, *s* where M is a closed SPCF term, w ∈ **R**≥<sup>0</sup> is a *weight*, and *s* ∈ **S** a trace. (We will sometimes refer to

#### *Redex Contractions*:

$$\begin{aligned} \langle \langle \lambda y.M \rangle V, w, \mathbf{s} \rangle &\to \langle M[V/y], w, \mathbf{s} \rangle \\ \langle \underline{f(r\_1, \dots, r\_\ell)}, w, \mathbf{s} \rangle &\to \langle \underline{f(r\_1, \dots, r\_\ell)}, w, \mathbf{s} \rangle \\ \langle \underline{f(r\_1, \dots, r\_\ell)}, w, \mathbf{s} \rangle &\to \text{fail} \\ \langle \forall (y.M, w, \mathbf{s}) \rangle &\to \langle \lambda z.M[\mathbf{Y}(\lambda y.M)/y]z, w, \mathbf{s} \rangle \qquad \text{(for fresh variable } z) \\ \langle \text{if}(\underline{r} \le 0, M, N), w, \mathbf{s} \rangle &\to \langle M, w, \mathbf{s} \rangle \\ \langle \text{if}(\underline{r} \le 0, M, N), w, \mathbf{s} \rangle &\to \langle N, w, \mathbf{s} \rangle \qquad \text{(for } r \ge 0) \\ \langle \text{if}(\underline{r} \le 0, M, N), w, \mathbf{s} \rangle &\to \langle N, w, \mathbf{s} \rangle \qquad \text{(for some } r \in (0, 1)) \\ \langle \text{sample}, w, \mathbf{s} \rangle &\to \langle \underline{r}, w, \mathbf{s} + [r] \rangle \qquad \text{(for some } r \in (0, 1)) \\ \langle \text{score}(\underline{r}), w, \mathbf{s} \rangle &\to \langle \underline{r}, r \cdot w, \mathbf{s} \rangle \qquad \text{(for } r < 0) \\ \langle \text{score}(\underline{r}), w, \mathbf{s} \rangle &\to \text{fail} \end{aligned}$$

### *Evaluation Contexts*:

$$\frac{\langle R, w, \mathbf{s} \rangle \to \langle R', w', \mathbf{s'} \rangle}{\langle E[R], w, \mathbf{s} \rangle \to \langle E[R'], w', \mathbf{s'} \rangle} \qquad \qquad \frac{\langle R, w, \mathbf{s} \rangle \to \mathbf{fail}}{\langle E[R], w, \mathbf{s} \rangle \to \mathbf{fail}}$$

### Fig. 3: Operational small-step semantics of SPCF

these as the concrete configurations, in contrast with the abstract configurations of our symbolic operational semantics, see Sec. 5.2.)

The small-step reduction relation → is defined in Fig. 3. In the rule for sample, a random value r ∈ (0, 1) is generated and recorded in the trace, while the weight remains unchanged: in a uniform distribution on (0, 1) each value is drawn with likelihood 1. In the rule for score(r), the current weight is multiplied by nonnegative r ∈ **R**: typically this reflects the likelihood of the current execution given some observed data. Similarly to [8] we reduce terms which cannot be reduced in a reasonable way (i.e. scoring with negative constants or evaluating functions outside their domain) to fail.

Example 2. We present a possible reduction sequence for the program in Ex. 1:

$$\begin{split} \langle \mathsf{Pred}, 1, [] \rangle &\to^{\*} \left\langle \begin{pmatrix} \mathsf{let} \ x = \underline{0,2} \cdot 3 \ \mathsf{in} \\ \mathsf{let} \ d = \mathsf{walk} \ x \ \mathsf{in} \\ \mathsf{let} \ w = \mathsf{score}(\mathsf{pdf}\_{\mathcal{N}(1,1,0.1)}(d)) \ \mathsf{in} \ x \end{pmatrix}, 1, [0.2] \right\rangle \\ &\to^{\*} \left\langle \begin{pmatrix} \mathsf{let} \ d = \mathsf{walk} \ \underline{0,6} \ \mathsf{in} \\ \mathsf{let} \ w = \mathsf{score}(\mathsf{pdf}\_{\mathcal{N}(1,1,0.1)}(d)) \ \mathsf{in} \ \underline{0,6} \end{pmatrix}, 1, [0.2] \right\rangle \\ &\to^{\*} \left\langle \begin{split} \mathsf{let} \ w = \mathsf{score}(\mathsf{pdf}\_{\mathcal{N}(1,1,0.1)}(\underline{0,9})) \ \mathsf{in} \ \underline{0,6}, 1, [0.2, 0.9, 0.7] \right\rangle \\ &\to^{\*} \left\langle \mathsf{let} \ w = \mathsf{score}(\underline{0,54}) \ \mathsf{in} \ \underline{0,6}, 1, [0.2, 0.9, 0.7] \right\rangle \\ &\to^{\*} \left\langle \underline{0,6}, 0.54, [0.2, 0.9, 0.7] \right\rangle \end{split} (\*) \end{split}$$

In this execution, the initial sample yields 0.2, which is appended to the trace. At step (), we assume given a reduction sequence walk 0.6, 1, [0.2] →<sup>∗</sup>

0.9, 1, [0.2, 0.9, 0.7]; this means that in the call to walk, 0.9 was sampled as the the step size and 0.7 as the direction factor; this makes the new location −0.3, which is negative, so the return value is 0.9. In the final step, we perform conditioning using the likelihood of observing 0.9 given the data 1.1: the score() expression updates the current weight using the the density of 0.9 in the normal distribution with parameters (1.1, 0.1).

**Value and Weight Functions.** Using the relation →, we now aim to reason more globally about probabilistic programs in terms of the traces they produce. Let M be an SPCF term with free variables amongst x1,...,x<sup>m</sup> of type R. Its *value function* value<sup>M</sup> : **<sup>R</sup>**<sup>m</sup> <sup>×</sup>**<sup>S</sup>** <sup>→</sup> <sup>Λ</sup><sup>0</sup> <sup>v</sup> ∪ {⊥} returns, given values for each free variable and a trace, the output value of the program, if the program terminates in a value. The *weight function* weight<sup>M</sup> : **<sup>R</sup>**<sup>m</sup> <sup>×</sup> **<sup>S</sup>** <sup>→</sup> **<sup>R</sup>**≥<sup>0</sup> returns the final weight of the corresponding execution. Formally:

$$\begin{aligned} \mathsf{value}\_M(\mathsf{r}, \mathsf{s}) &:= \begin{cases} V & \text{if } \langle M[\underline{\mathsf{r}}/\mathsf{z}], 1, [] \rangle \to^\* \langle V, w, \mathsf{s} \rangle \\ \bot & \text{otherwise} \end{cases} \\\\ \mathsf{weight}\_M(\mathsf{r}, \mathsf{s}) &:= \begin{cases} w & \text{if } \langle M[\underline{\mathsf{r}}/\mathsf{z}], 1, [] \rangle \to^\* \langle V, w, \mathsf{s} \rangle \\ 0 & \text{otherwise} \end{cases} \end{aligned}$$

For closed SPCF terms M we just write weightM(*s*) for weightM([], *s*) (similarly for valueM), and it follows already from [8, Lemma 9] that the functions value<sup>M</sup> and weight<sup>M</sup> are measurable (see also Sec. 4.1).

Finally, every closed SPCF term M has an associated *value measure*

$$[[M] : \Sigma\_{A\_v^0} \to \mathbb{R}\_{\geq 0}]$$

defined by -<sup>M</sup>(U) := <sup>+</sup> valueM−1(U) dμ**<sup>S</sup>** weightM. This corresponds to the denotational semantics of SPCF in the ω-quasi-Borel space model via computational adequacy [51].

Returning to Remark 1, what are the connections, if any, between the two types of density of a program? To distinguish them, let's refer to the weight function of the program, weightM, as its trace density, and the Radon-Nikodyn derivative of the program's value-measure, <sup>d</sup>M dν where ν is the reference measure of the measurable space ΣΛ<sup>0</sup> <sup>v</sup> , as the output density. Observe that, for any measurable function f : Λ<sup>0</sup> <sup>v</sup> <sup>→</sup> [0, <sup>∞</sup>], <sup>+</sup> Λ<sup>0</sup> v d -<sup>M</sup> <sup>f</sup> <sup>=</sup> <sup>+</sup> value−<sup>1</sup> <sup>M</sup> (Λ<sup>0</sup> <sup>v</sup>) dμ**<sup>S</sup>** weight<sup>M</sup> ·(f ◦ valueM) = + **<sup>S</sup>** dμ**<sup>S</sup>** weight<sup>M</sup> ·(<sup>f</sup> ◦valueM) (because if *<sup>s</sup>* ∈ value−<sup>1</sup> <sup>M</sup> (ΣΛ<sup>0</sup> <sup>v</sup> ) then weightM(*s*) = 0). It follows that we can express any expectation w.r.t. the output density <sup>d</sup>M dν as an expectation w.r.t. the trace density weightM. If our aim is, instead, to generate samples from <sup>d</sup>M dν then we can simply generate samples from weightM, and deterministically convert each sample to the space (Λ<sup>0</sup> <sup>v</sup>, ΣΛ<sup>0</sup> <sup>v</sup> ) via the value function valueM. In other words, if our intended output is just a sequence of samples, then our inference engine does not need to concern itself with the consequences of change of variables.

# **4 Differentiability of the Weight and Value Functions**

To reason about the differential properties of these functions we place ourselves in a setting in which differentiation makes sense. We start with some preliminaries.

### **4.1 Background on Differentiable Functions**

Basic real analysis gives a standard notion of differentiability at a point *<sup>x</sup>* <sup>∈</sup> **<sup>R</sup>**<sup>n</sup> for functions between Euclidean spaces **<sup>R</sup>**<sup>n</sup> <sup>→</sup> **<sup>R</sup>**m. In this context a function <sup>f</sup> : **<sup>R</sup>**<sup>n</sup> <sup>→</sup> **<sup>R</sup>**<sup>m</sup> is *smooth* on an open <sup>U</sup> <sup>⊆</sup> **<sup>R</sup>**<sup>n</sup> if it has derivatives of all orders at every point of U. The theory of differential geometry (see e.g. the textbooks [50,29,28]) abstracts away from Euclidean spaces to smooth manifolds. We recall the formal definitions.

A topological space M is *locally Euclidean at a point* x ∈ M if x has a neighbourhood U such that there is a homeomorphism φ from U onto an open subset of **<sup>R</sup>**<sup>n</sup>, for some <sup>n</sup>. The pair (U, φ : <sup>U</sup> <sup>→</sup> **<sup>R</sup>**<sup>n</sup>) is called a *chart* (of dimension n). We say M is *locally Euclidean* if it is locally Euclidean at every point. A *manifold* M is a Hausdorff, second countable, locally Euclidean space.

Two charts, (U, φ : <sup>U</sup> <sup>→</sup> **<sup>R</sup>**<sup>n</sup>) and (V,ψ : <sup>V</sup> <sup>→</sup> **<sup>R</sup>**<sup>m</sup>), are *compatible* if the function <sup>ψ</sup> ◦ <sup>φ</sup>−<sup>1</sup> : <sup>φ</sup>(<sup>U</sup> <sup>∩</sup> <sup>V</sup> ) <sup>→</sup> <sup>ψ</sup>(<sup>U</sup> <sup>∩</sup> <sup>V</sup> ) is smooth, with a smooth inverse. An *atlas* on M is a family {(Uα, φα)} of pairwise compatible charts that cover M. A *smooth manifold* is a manifold equipped with an atlas.

It follows from the topological invariance of dimension that charts that cover a part of the same connected component have the same dimension. We emphasise that, although this might be considered slightly unusual, distinct connected components need not have the same dimension. This is important for our purposes: **S** is easily seen to be a smooth manifold since each connected component **S**<sup>i</sup> is diffeomorphic to **R**<sup>i</sup> . It is also straightforward to endow the set Λ of SPCF terms with a (smooth) manifold structure. Following [8] we view Λ as - m∈**N** SK<sup>m</sup> <sup>×</sup> **<sup>R</sup>**<sup>m</sup> , where SK<sup>m</sup> is the set of SPCF terms with exactly m place-holders (a.k.a. skeleton terms) for numerals. Thus identified, we give Λ the countable disjoint union topology of the product topology of the discrete topology on SK<sup>m</sup> and the standard topology on **R**<sup>m</sup>. Note that the connected components of <sup>Λ</sup> have the form {M} × **<sup>R</sup>**<sup>m</sup>, with <sup>M</sup> ranging over SKm, and <sup>m</sup> over **N**. So in particular, the subspace Λ<sup>v</sup> ⊆ Λ of values inherits the manifold structure. We fix the Borel algebra of this topology to be the σ-algebra on Λ.

Given manifolds (M, {Uα, φα}) and (M , {Vβ, ψβ}), a function f : M→M is *differentiable* at a point x ∈ M if there are charts (Uα, φα) about x and (Vβ, ψβ) about <sup>f</sup>(x) such that the composite <sup>ψ</sup><sup>β</sup> ◦ <sup>f</sup> ◦ <sup>φ</sup>−<sup>1</sup> <sup>α</sup> restricted to the open subset <sup>φ</sup>α(<sup>f</sup> <sup>−</sup><sup>1</sup>(Vβ) <sup>∩</sup> <sup>U</sup>α) is differentiable at <sup>φ</sup>α(x).

The definitions above are useful because they allow for a uniform presentation. But it is helpful to unpack the definition of differentiability in a few instances, and we see that they boil down to the standard sense in real analysis. Take an SPCF term M with free variables amongst x1,...,x<sup>m</sup> (all of type **R**), and (*r*, *<sup>s</sup>*) <sup>∈</sup> **<sup>R</sup>**<sup>m</sup> <sup>×</sup> **<sup>S</sup>**n.

	- 1. valueM(*r-* , *s-* ) = ⊥ for all (*r-* , *s-* ) ∈ U; or
	- 2. valueM(*r-* , *s-* ) = ⊥ for all (*r-* , *s-* ) ∈ U, and value <sup>M</sup> : U → **R** is differentiable at (*r*, *s*), where we define value M(*r-* , *s-* ) := r whenever valueM(*r-* , *s-* ) = r.

### **4.2 Why Almost Everywhere Differentiability Can Fail**

Conditional statements break differentiability. This is easy to see with an example: the weight function of the term

$$\text{if}\left(\text{sample}\leq \textsf{sample}, \textsf{score}(\textsf{1}), \textsf{score}(\textsf{0})\right)$$

is exactly the characteristic function of {[s1, s2] ∈ **S** | s<sup>1</sup> ≤ s2}, which is not differentiable on the diagonal {[s, s] ∈ **S**<sup>2</sup> | s ∈ (0, 1)}.

This function is however differentiable almost everywhere: the diagonal is an uncountable set but has Leb<sup>2</sup> measure zero in the space **S**2. Unfortunately, this is not true in general. Without sufficient restrictions, conditional statements also break almost everywhere differentiability. This can happen for two reasons.

**Problem 1: Pathological Primitive Functions.** Recall that our definition of SPCF is parametrised by a set F of primitive functions. It is tempting in this context to take F to be the set of all differentiable functions, but this is too general, as we show now. Consider that for every f : **R** → **R** the term

$$\text{if}\left(\underline{f}(\texttt{sample}) \le 0, \texttt{score}(\underline{1}), \texttt{score}(\underline{0})\right)$$

has weight function the characteristic function of {[s1] ∈ **S** | f(s1) ≤ 0}. This function is non-differentiable at every <sup>s</sup><sup>1</sup> <sup>∈</sup> **<sup>S</sup>**<sup>1</sup> <sup>∩</sup> ∂f <sup>−</sup><sup>1</sup>(−∞, 0]: in every neighbourhood of s<sup>1</sup> there are s <sup>1</sup> and s <sup>1</sup> such that f(s <sup>1</sup>) ≤ 0 and f(s <sup>1</sup> ) > 0. One can construct a differentiable f for which this is not a measure zero set. (For example, there exists a non-negative function f which is zero exactly on a fat Cantor set, i.e., a Cantor-like set with strictly positive measure. See [43, Ex. 5.21].)

**Problem 2: Non-Terminating Runs.** Our language has recursion, so we can construct a term which samples a random number, halts if this number is in **Q** ∩ [0, 1], and diverges otherwise. In pseudo-code:

```
let rec enumQ p q r =
  i f ( r = p/q) then ( score 1) else
    i f ( r < p/q) then
```
444 C. Mak et al.

```
enumQ p ( q+1) r
    else
      enumQ ( p+1) q r
in enumQ 0 1 sample
```
The induced weight function is the characteristic function of {[s1] ∈ **S** | s<sup>1</sup> ∈ **Q**}; the set of points at which this function is non-differentiable is **S**1, which has measure 1.

We proceed to overcome Problem 1 by making appropriate assumptions on the set of primitives. We will then address Problem 2 by focusing on almost surely terminating programs.

### **4.3 Admissible Primitive Functions**

One contribution of this work is to identify sufficient conditions for F. We will show in Sec. 6 that our main result holds provided:

**Assumption 1** (Admissible Primitive Functions)**.** F is a set of partial, measurable functions **R** # **R** including all constant and projection functions which satisfies

1. if <sup>f</sup> : **<sup>R</sup>** # **<sup>R</sup>** and <sup>g</sup><sup>i</sup> : **<sup>R</sup>**<sup>m</sup> # **<sup>R</sup>** are elements of <sup>F</sup> for <sup>i</sup> = 1,...,, then <sup>f</sup> ◦ gi <sup>i</sup>=1 : **<sup>R</sup>**<sup>m</sup> # **<sup>R</sup>** is in <sup>F</sup> 2. if (<sup>f</sup> : **<sup>R</sup>** # **<sup>R</sup>**) ∈ F, then <sup>f</sup> is differentiable in the interior of dom(f) 3. if (<sup>f</sup> : **<sup>R</sup>** # **<sup>R</sup>**) ∈ F, then Leb(∂f <sup>−</sup><sup>1</sup>[0, <sup>∞</sup>)) = 0.

Example 3. The following sets of primitive operations satisfy the above sufficient conditions. (See [34] for a proof.)


Note that all primitive functions mentioned in our examples (and in particular the density of the normal distribution) are included in both F<sup>1</sup> and F2.

It is worth noting that both F<sup>1</sup> and F<sup>2</sup> satisfy the following stronger (than Assumption 1.3) property: Lebn(∂f <sup>−</sup><sup>1</sup>I) = 0 for every interval I, for every primitive function f.

<sup>3</sup> This requirement is crucial, and cannot be relaxed.

<sup>4</sup> i.e. a finite union of <sup>I</sup><sup>1</sup> ×···× <sup>I</sup> for (possibly unbounded) intervals <sup>I</sup><sup>i</sup>

### **4.4 Almost Sure Termination**

To rule out the contrived counterexamples which diverge we restrict attention to almost surely terminating SPCF terms. Intuitively, a program M (closed term of ground type) is almost surely terminating if the probability that a run of M terminates is 1.

Take an SPCF term M with variables amongst x1,...,x<sup>m</sup> (all of type **R**), and set

$$\mathbb{T}\_{M,\text{term}} := \left\{ (\mathbf{r}, \mathbf{s}) \in \mathbb{R}^m \times \mathbb{S} \mid \exists V, w. \; \langle M[\mathbf{r}/\mathbf{z}], 1, [] \rangle \to^\* \langle V, w, \mathbf{s} \rangle \right\}. \tag{1}$$

Let us first consider the case of closed <sup>M</sup> <sup>∈</sup> <sup>Λ</sup><sup>0</sup> i.e. <sup>m</sup> = 0 (notice that the measure <sup>μ</sup>**R**m×**<sup>S</sup>** is not finite, for <sup>m</sup> <sup>≥</sup> 1). As **<sup>T</sup>**M,term now coincides with value−<sup>1</sup> <sup>M</sup> (Λ<sup>0</sup> v), **T**M,term is a measurable subset of **S**. Plainly if M is deterministic (i.e. samplefree), then μ**S**(**T**M,term) = 1 if M converges to a value, and 0 otherwise. Generally for an arbitrary (stochastic) term M we can regard μ**S**(**T**M,term) as the probability that a run of M converges to a value, because of Lem. 1.

**Lemma 1.** If <sup>M</sup> <sup>∈</sup> <sup>Λ</sup><sup>0</sup> then <sup>μ</sup>**S**(**T**M,term) <sup>≤</sup> <sup>1</sup>.

More generally, if M has free variables amongst x1,...,x<sup>m</sup> (all of type R), then we say that M is almost surely terminating if for almost every (instantiation of the free variables by) *<sup>r</sup>* <sup>∈</sup> **<sup>R</sup>**<sup>m</sup>, <sup>M</sup>[*r*/*x*] terminates with probability 1.

We formalise the notion of almost sure termination as follows.

### **Definition 1.** Let M be an SPCF term. We say that M *terminates almost surely* if


Suppose that M is a closed term and M is obtained from M by recursively replacing subterms score(L) with the term if L < 0, Nfail, L , where Nfail is a term that reduces to fail such as 1/0. It is easy to see that for all *s* ∈ **S**, < M, 1, []= →<sup>∗</sup> V, 1, *s* iff for some (unique) w ∈ **R**≥<sup>0</sup>, M, 1, [] →<sup>∗</sup> V, w, *s*. Therefore,

$$\begin{aligned} \{M^\flat\}(A\_v) &= \int\_{\mathsf{value}^{-1}\_{M^\flat}(A\_v)} \mathsf{d}\mu\_{\mathbb{S}^\*} \mathsf{weight}\_{M^\flat} \\ &= \mu\_{\mathbb{S}}(\{\mathsf{s} \in \mathbb{S} \,|\, \exists V \,\, \Big\prime \, M^\flat \,. \, [\, \Big\prime] \to^\* \langle V, 1, \mathsf{s} \rangle\}) = \mu\_{\mathbb{S}}(\mathsf{T}\_{M, \mathsf{term}}) \end{aligned}$$

Consequently, the closed term <sup>M</sup> terminates almost surely iff -<sup>M</sup> is a probability measure.

Remark 2. **–** Like many treatments of semantics of probabilistic programs in the literature, we make no distinction between non-terminating runs and aborted runs of a (closed) term M: both could result in the value semantics -<sup>M</sup> being a sub-probabilty measure (cf. [4]).

**–** Even so, current probabilistic programming systems do not place any restrictions on the code that users can write: it is perfectly possible to construct invalid models because catching programs that do not define valid probability distributions can be hard, or even impossible. This is not surprising, because almost sure termination is hard to decide: it is Π<sup>0</sup> <sup>2</sup> -complete in the arithmetic hierarchy [22]. Nevertheless, because a.s. termination is an important correctness property of probabilistic programs (not least because of the main result of this paper, Thm. 3), the development of methods to prove a.s. termination is a hot research topic.

Accordingly the main theorem of this paper is stated as follows:

**Theorem 3.** Let M be an SPCF term (possibly with free variables of type R) which terminates almost surely. Then its weight function weight<sup>M</sup> and value function value<sup>M</sup> are differentiable almost everywhere.

# **5 Stochastic Symbolic Execution**

We have seen that a source of discontinuity is the use of if-statements. Our main result therefore relies on an in-depth understanding of the branching behaviour of programs. The operational semantics given in Sec. 3 is unsatisfactory in this respect: any two execution paths are treated independently, whether they go through different branches of an if-statement or one is obtained from the other by using slightly perturbed random samples not affecting the control flow.

More concretely, note that although we have derived weightPed[0.2, 0.9, 0.7] = 0.54 and valuePed[0.2, 0.9, 0.7] = 0.6 in Ex. 2, we cannot infer anything about weightPed[0.21, 0.91, 0.71] and valuePed[0.21, 0.91, 0.71] unless we perform the corresponding reduction.

So we propose an alternative symbolic operational semantics (similar to the "compilation scheme" in [55]), in which no sampling is performed: whenever a sample command is encountered, we simply substitute a fresh variable α<sup>i</sup> for it, and continue on with the execution. We can view this style of semantics as a stochastic form of symbolic execution [12,23], i.e., a means of analysing a program so as to determine what inputs, and random draws (from sample) cause each part of a program to execute.

Consider the term M ≡ let x = sample · 3 in (walk x), defined using the function walk of Ex. 1. We have a reduction path

$$M \Rightarrow \mathsf{let} \ (x = \alpha\_1 \cdot \underline{3}) \ \mathsf{in} \ (\mathsf{walk} \ x) \Rightarrow \mathsf{walk} \ (\alpha\_1 \cdot \underline{3})$$

but at this point we are stuck: the CBV strategy requires a value for α1. We will "delay" the evaluation of the multiplication α<sup>1</sup> · 3; we signal this by drawing a box around the delayed operation: α<sup>1</sup> · 3. We continue the execution, inspecting the definition of walk, and get:

$$M \Rightarrow^\* \mathsf{wæk} \left( \alpha\_1 \square \underline{3} \right) \Rightarrow^\* N \equiv \mathsf{if} \left( \alpha\_1 \square \underline{3} \leq 0, \underline{0}, P \right).$$

where

$$P \equiv \begin{pmatrix} \texttt{let } s = \texttt{sample in} \\ \texttt{if} \left( (\texttt{sample} \leq \underline{0.5}), \left( s + \texttt{walk} (\alpha\_1 \boxed{12} + s) \right), \left( s + \texttt{walk} (\alpha\_1 \boxed{2} - s) \right) \right) \end{pmatrix}.$$

We are stuck again: the value of α<sup>1</sup> is needed in order to know which branch to follow. Our approach consists in considering the space **S**<sup>1</sup> = (0, 1) of possible values for α1, and splitting it into {s<sup>1</sup> ∈ (0, 1) | s<sup>1</sup> · 3 ≤ 0} = ∅ and {s<sup>1</sup> ∈ (0, 1) | s<sup>1</sup> · 3 > 0} = (0, 1). Each of the two branches will then yield a weight function restricted to the appropriate subspace.

Formally, our symbolic operational semantics is a rewrite system of configurations of the form ⟪*M* ,*w*, U⟫, where *M* is a term with delayed (boxed) operations, and free "sampling" variables<sup>5</sup> <sup>α</sup>1,...,αn; <sup>U</sup> <sup>⊆</sup> **<sup>S</sup>**<sup>n</sup> is the subspace of sampling values compatible with the current branch; and *w* : U → **R**≥<sup>0</sup> is a function assigning to each *<sup>s</sup>* <sup>∈</sup> <sup>U</sup> a weight *<sup>w</sup>*(*s*). In particular, for our running example<sup>6</sup>

⟪M, **λ**[]. 1, **S**0⟫ ⇒<sup>∗</sup> ⟪N, **λ**[s1]. 1,(0, 1)⟫.

As explained above, this leads to two branches:

$$\left\langle N, \mathbb{A}[s\_1].1, (0, 1) \right\rangle \overset{\approx}{\underset{\approx}{\*}} \begin{aligned} \left\langle \underline{0}, \mathbb{A}[s\_1].1, \emptyset \right\rangle \\ \left\langle P, \mathbb{A}[s\_1].1, (0, 1) \right\rangle \end{aligned}$$

The first branch has reached a value, and the reader can check that the second branch continues as

$$\begin{aligned} & \langle \!\langle P, \mathbb{A}[s\_1].1, (0, 1) \rangle \!\rangle \Rightarrow \\ & \quad \langle \!\langle \!\langle \alpha\_3 \leq \underline{0}.\overline{5}, \alpha\_2 + \mathsf{walk}(\alpha\_1 \boxdot \underline{3} + \alpha\_2), \alpha\_2 + \mathsf{walk}(\alpha\_1 \boxdot \underline{3} - \alpha\_2) \rangle \rangle, \mathbb{A}[s\_1, s\_2, s\_3].1, (0, 1)^3 \rangle \end{aligned}$$

where α<sup>2</sup> and α<sup>3</sup> stand for the two sample statements in P. From here we proceed by splitting (0, 1)<sup>3</sup> into (0, 1) <sup>×</sup> (0, 1) <sup>×</sup> (0, <sup>0</sup>.5] and (0, 1) <sup>×</sup> (0, 1) <sup>×</sup> (0.5, 1) and after having branched again (on whether we have passed 0) the evaluation of walk can terminate in the configuration

$$\langle \alpha\_2 \boxed{\+} 0, \mathbb{A}[s\_1, s\_2, s\_3], 1, U \rangle$$

where U := {[s1, s2, s3] ∈ **S**<sup>3</sup> | s<sup>3</sup> > 0.5 ∧ s<sup>1</sup> · 3 − s<sup>2</sup> ≤ 0}.

Recall that M appears in the context of our running example Ped. Using our calculations above we derive one of its branches:

$$\begin{split} & \langle [\mathsf{Ped}, \mathbb{A}][.1, \{\}] \rangle \right) \Rightarrow^{\star} \langle \mathsf{let} \, w = \mathsf{score}(\mathsf{pdf}\_{\mathcal{N}(1, 1, 0.1)}(\alpha\_{2})) \, \mathsf{in} \, \alpha\_{1} \square \exists \mathsf{3}, \mathsf{l}[\mathsf{s}\_{1}, \mathsf{s}\_{2}, \mathsf{s}\_{3}]. \mathsf{1}, \mathsf{U} \rangle \\ & \Rightarrow \langle \mathsf{let} \, w = \mathsf{score}(\overline{\mathsf{pdf}\_{\mathcal{N}(1, 1, 0.1)}}(\alpha\_{2})) \, \mathsf{in} \, \alpha\_{1} \square \exists \mathsf{3}, \mathsf{l}[s\_{1}, s\_{2}, s\_{3}]. \mathsf{1}, U \rangle \\ & \Rightarrow^{\star} \langle \mathsf{let} \, w = \overline{\mathsf{pdf}\_{\mathcal{N}(1, 1, 0.1)}}(\alpha\_{2}) \, \mathsf{in} \, \alpha\_{1} \square \exists \mathsf{3}, \mathsf{l}[s\_{1}, s\_{2}, s\_{3}]. \mathsf{pdf}\_{\mathcal{N}(1, 1, 0.1)}(s\_{2}), U \rangle \\ & \Rightarrow^{\star} \langle \alpha\_{1} \square \exists \mathsf{3}, \mathsf{l}[s\_{1}, s\_{2}, s\_{3}]. \mathsf{pdf}\_{\mathcal{N}(1, 1, 0.1)}(s\_{2}), U \rangle \end{split}$$

<sup>5</sup> Note that *M* may be open and contain other free "non-sampling" variables, usually denoted x1,...,xm.

<sup>6</sup> We use the meta-lambda-abstraction **λ**x. f(x) to denote the set-theoretic function x → f(x).

In particular the trace [0.2, 0.9, 0.7] of Ex. 2 lies in the subspace U. We can immediately read off the corresponding value and weight functions for all [s1, s2, s3] ∈ U simply by evaluating the computation α<sup>1</sup> ·3, which we have delayed until now:

valuePed[s1, s2, s3] = <sup>s</sup><sup>1</sup> · <sup>3</sup> weightPed[s1, s2, s3] = pdfN(1.1,0.1)(s2)

### **5.1 Symbolic Terms and Values**

We have just described informally our symbolic execution approach, which involves delaying the evaluation of primitive operations. We make this formal by introducing an extended notion of terms, which we call *symbolic terms* and define in Fig. 4a along with a notion of *symbolic values*. For this we assume fixed denumerable sequences of *distinguished* variables: α1, α2,..., used to represent sampling, and x1, x2,... used for free variables of type R. Symbolic terms are typically denoted *M* , *N* , or *L*. They contain terms of the form f (*V*1,..., *V*) for <sup>f</sup> : **<sup>R</sup>** # **<sup>R</sup>** ∈ F a primitive function, representing delayed evaluations, and they also contain the sampling variables α<sup>j</sup> . The type system is adapted in a straightforward way, see Fig. 4b.

We use Λ(m,n) to refer to the set of well-typed symbolic terms with free variables amongst x1,...,x<sup>m</sup> and α1,...,α<sup>n</sup> (and all are of type R). Note that every term in the sense of Fig. 2 is also a symbolic term.

Each symbolic term *M* ∈ Λ(m,n) has a corresponding set of regular terms, accounting for all possible values for its sampling variables α1,...,α<sup>n</sup> and its (other) free variables <sup>x</sup>1,...,xm. For *<sup>r</sup>* <sup>∈</sup> **<sup>R</sup>**<sup>m</sup> and *<sup>s</sup>* <sup>∈</sup> **<sup>S</sup>**n, we call *partially evaluated instantiation* of *M* the term '*M* ( (*r*, *s*) obtained from *M* [*r*/*x*, *s*/*α*] by recursively "evaluating" subterms of the form f (r1,...,r) to f(r1,...,r), provided (r1,...,r) ∈ dom(f). In this operation, subterms of the form f(r1,...,r) are left unchanged, and so are any other redexes. '*M* ( can be viewed as a partial function '*<sup>M</sup>* ( : **<sup>R</sup>**<sup>m</sup> <sup>×</sup> **<sup>S</sup>**<sup>n</sup> # Λ and a formal definition is presented in Fig. 5b. (To be completely rigorous, we define for fixed m and n, partial functions '*<sup>M</sup>* (m,n : **<sup>R</sup>**<sup>m</sup>×**S**<sup>n</sup> # Λ for symbolic terms *<sup>M</sup>* whose distinguished variables are amongst x1,...,x<sup>m</sup> and α1,...,αn. *M* may contain other variables y, z,... of any type. Since m and n are usually clear from the context, we omit them.) Observe that for *M* ∈ Λ(m,n) and (*r*, *s*) ∈ dom '*M* (, '*M* ( (*r*, *s*) is a closed term.

Example 4. Consider *<sup>M</sup>* <sup>≡</sup> (λz. α<sup>1</sup> · 3) (score(pdfN(1.1,0.1)(α2))). Then, for *<sup>r</sup>* <sup>=</sup> [] and *<sup>s</sup>* = [0.2, <sup>0</sup>.9, <sup>0</sup>.7], we have '*<sup>M</sup>* ( (*r*, *<sup>s</sup>*)=(λz. <sup>0</sup>.6) (score(pdfN(1.1,0.1)(0.9))).

More generally, observe that if Γ *M* : σ and (*r*, *s*) ∈ dom '*M* ( then Γ '*<sup>M</sup>* ( (*r*, *<sup>s</sup>*) : <sup>σ</sup>. In order to evaluate conditionals if *<sup>L</sup>* <sup>≤</sup> <sup>0</sup>, *<sup>M</sup>* , *<sup>N</sup>* we need to reduce *L* to a real constant, i.e., we need to have '*L*( (*r*, *s*) = r for some r ∈ **R**. This is the case whenever *L* is a symbolic value of type R, since these are built only out of delayed operations, real constants and distinguished variables x<sup>i</sup> or α<sup>j</sup> . Indeed we can show the following:

**Lemma 2.** Let (*r*, *s*) ∈ dom '*M* (. Then *M* is a symbolic value iff '*M* ( (*r*, *s*) is a value.

*V* ::= r | x<sup>i</sup> | α<sup>j</sup> | f (*V*1,..., *V*) | λy. *M <sup>M</sup>* , *<sup>N</sup>* , *<sup>L</sup>* ::= *<sup>V</sup>* <sup>|</sup> <sup>y</sup> <sup>|</sup> <sup>f</sup>(*M*1,..., *<sup>M</sup>*) <sup>|</sup> *M N* <sup>|</sup> <sup>Y</sup>*<sup>M</sup>* <sup>|</sup> if *<sup>L</sup>* <sup>≤</sup> <sup>0</sup>, *<sup>M</sup>* , *<sup>N</sup>* | sample | score(*M* )

(a) Symbolic values (typically *V* ) and symbolic terms (typically *M* , *N* or *L*)

Γ *V*<sup>1</sup> : R ··· Γ *V* : R Γ f (*V*1,..., *V*) : R Γ x<sup>i</sup> : R Γ α<sup>j</sup> : R Γ, y : <sup>σ</sup> <sup>y</sup> : <sup>σ</sup> <sup>Γ</sup> <sup>r</sup> : <sup>R</sup> <sup>r</sup> <sup>∈</sup> **<sup>R</sup>** Γ *M*<sup>1</sup> : R ··· Γ *M* : R Γ f(*M*1,..., *M*) : R Γ, y : σ *M* : τ Γ λy. *M* : σ → τ Γ *M* : σ → τΓ *N* : σ Γ *M N* : τ Γ *M* : (σ ⇒ τ ) ⇒ σ ⇒ τ Γ Y*M* : σ ⇒ τ Γ *L* : R Γ *M* : σΓ *N* : σ <sup>Γ</sup> if *<sup>L</sup>* <sup>≤</sup> <sup>0</sup>, *<sup>M</sup>* , *<sup>N</sup>* : σ Γ sample : R Γ *M* : R Γ score(*M* ) : R

(b) Type system for symbolic terms

$$\begin{split} \mathcal{R} &::= (\lambda y. \mathcal{M}) \; \mathcal{V} \mid \underline{f} (\vert \mathcal{V}\_{1}, \dots, \mathcal{V}\_{\ell} \vert) \; \mid \, \mathsf{Y} (\lambda y. \mathcal{M}) \; | \; \mathsf{if} \left( \vert \mathcal{V} \vert \leq 0, \mathcal{M}, \mathcal{N} \right) \; | \; \mathsf{sample} \mid \mathsf{score} (\vert \mathcal{V} \vert) \\ \mathcal{E} &::= \left[ \right] \mid \; \mathsf{E} \; \mathcal{M} \mid (\lambda y. \mathcal{M}) \; \mathbb{E} \mid \underline{f} (\vert \mathcal{V}\_{1}, \dots, \mathcal{V}\_{i-1} \vert, \mathcal{E}, \mathcal{M}\_{i+1}, \dots, \mathcal{M}\_{\ell}) \; \mid \; \mathsf{Y} \mathcal{E} \mid \\ & \quad \text{if} \left( \mathcal{E} \leq 0, \mathcal{M}, \mathcal{N} \right) \; \mid \; \mathsf{score} (\mathcal{E}) \end{split}$$

(c) Symbolic values (typically *V* ), redexes (*R* ) and reduction contexts (*E*).

Fig. 4: Symbolic terms and values, type system, reduction contexts, and redexes. As usual f ∈ F and r ∈ **R**.

For symbolic values *V* : R and (*r*, *s*) ∈ dom '*V* ( we employ the notation 8*V* 8 (*r*, *s*) := r provided that '*V* ( (*r*, *s*) = r .

A simple induction on symbolic terms and values yields the following property, which is crucial for the proof of our main result (Thm. 3):

**Lemma 3.** Suppose the set F of primitives satisfies Item 1 of Assumption 1.


### **5.2 Symbolic Operational Semantics**

We aim to develop a symbolic operational semantics that provides a sound and complete abstraction of the (concrete) operational trace semantics. The symbolic

dom ( f (*V*1,..., *V*) ) := {(*r*, *s*) ∈ dom *V*1 ∩···∩ dom *V* | (r- 1,...,r- ) ∈ dom(f), where r- <sup>1</sup> = *V*<sup>1</sup> (*r*, *s*), ··· , r- = *V* (*r*, *s*)} dom sample := dom <sup>x</sup><sup>i</sup> := dom <sup>α</sup><sup>j</sup> := dom <sup>y</sup> := dom ( r- ) := **<sup>R</sup>**<sup>m</sup> <sup>×</sup> **<sup>S</sup>**<sup>n</sup> domf(*M*1,..., *M*) := dom *M*1 ∩···∩ dom *M* dom λy. *M* := dom Y*M* := dom score(*M* ) := dom *M* dom *M N* := dom *M* ∩ dom *N* dom ( if *<sup>L</sup>* <sup>≤</sup> <sup>0</sup>, *<sup>M</sup>* , *<sup>N</sup>* ) := dom *<sup>L</sup>*<sup>∩</sup> dom *<sup>M</sup>*<sup>∩</sup> dom *<sup>N</sup>*

(a) Domain of ·

( f (*V*1,..., *V*) ) (*r*, *s*) := f(r- 1,...,r- ) , where for 1 ≤ i ≤ , *V*<sup>i</sup> (*r*, *s*) = r- i x<sup>i</sup> (*r*, *s*) := r<sup>i</sup> α<sup>j</sup> (*r*, *s*) := s<sup>j</sup> y (*r*, *s*) := y ( r- ) (*r*, *s*) := r- ( f(*M*1,..., *M*) ) (*r*, *s*) := f(*M*<sup>1</sup> (*r*, *s*),..., *M* (*r*, *s*)) λy. *M* (*r*, *s*) := λy. *M* (*r*, *s*) *M N* (*r*, *s*) := (*M* (*r*, *s*)) (*N* (*r*, *s*)) Y*M* (*r*, *s*) := Y(*M* (*r*, *s*)) ( if *<sup>L</sup>* <sup>≤</sup> <sup>0</sup>, *<sup>M</sup>* , *<sup>N</sup>* ) (*r*, *<sup>s</sup>*) := if *L* (*r*, *s*) ≤ 0, *M* (*r*, *s*), *N* (*r*, *s*) sample (*r*, *s*) := sample score(*M* ) (*r*, *s*) := score(*M* (*r*, *s*))

(b) Definition of · on dom ·

Fig. 5: Formal definition of the instantiation and partial evaluation function '·(

semantics is presented as a rewrite system of *symbolic configurations*, which are defined to be triples of the form ⟪*M* ,*w*, U⟫, where for some m and n, *M* ∈ <sup>Λ</sup>(m,n), <sup>U</sup> <sup>⊆</sup> dom '*<sup>M</sup>* ( ⊆ **<sup>R</sup>**<sup>m</sup> <sup>×</sup> **<sup>S</sup>**<sup>n</sup> is measurable, and *<sup>w</sup>* : **<sup>R</sup>**<sup>m</sup> <sup>×</sup> **<sup>S</sup>** # **<sup>R</sup>**≥<sup>0</sup> with dom(*w*) = U. Thus we aim to prove the following result (writing *1* for the constant function **λ**(*r*, *s*). 1):

**Theorem 1.** Let M be a term with free variables amongst x1,...,xm.


As formalised by Thm. 1, the key intuition behind symbolic configurations ⟪*M* ,*w*, U⟫ (that are reachable from a given ⟪M, *1*, **R**<sup>m</sup>⟫) is that, whenever *M* is a symbolic value:


moreover, the respective third components U (of the symbolic configurations ⟪*M* ,*w*, U⟫) cover **T**M,term.

To establish Thm. 1, we introduce *symbolic reduction contexts* and *symbolic redexes*. These are presented in Fig. 4c and extend the usual notions (replacing real constants with arbitrary symbolic values of type R).

Using Lem. 2 we obtain:

**Lemma 4.** If *R* is a symbolic redex and (*r*, *s*) ∈ dom '*R* ( then '*R* ( (*r*, *s*) is a redex.

The following can be proven by a straightforward induction:

**Lemma 5 (Subject Construction).** Let *M* be a symbolic term.


The partial instantiation function also extends to symbolic contexts *E* in the evident way – we give the full definition in [34].

Now, we introduce the following rules for *symbolic redex contractions*:

$$\begin{aligned} \{ (\lambda y.\mathcal{M})\,\,\forall',\,\omega,U \} &\Rightarrow \{ \mathcal{M}[\mathcal{V}/y],\,\omega,U \} \\ \{ \underline{f}(\mathcal{V}\_{1},\ldots,\mathcal{V}\_{\ell}),\,\omega,U \} &\Rightarrow \{ \underline{f}[(\mathcal{V}\_{1},\ldots,\mathcal{V}\_{\ell}),\,\omega,\,\mathsf{dom}\,\|\,\|\!\!\!\!/(\mathcal{V}\_{1},\ldots,\mathcal{V}\_{\ell})] \cap U \} \\ \{ \mathcal{Y}(\lambda y.\mathcal{M}),\,\omega,U \} &\Rightarrow \{ \lambda z.\mathcal{M}\,\,\|\,\mathcal{Y}(\lambda y.\mathcal{M})/y\|\,z,\,\omega,U \} \\ \{ \|\!\dot{\mathcal{V}}\langle\!\!\!/\leq 0,\mathcal{M},\mathcal{N} \rangle\,\,\omega,U \} &\Rightarrow \{ \mathcal{M},\,\omega,\,\|\!\!\!\!/\|^{-1}\langle -\infty,0 \rangle \cap U \ \} \\ \{ \|\!\dot{\mathcal{V}}(\mathcal{V}\leq 0,\mathcal{M},\mathcal{N})\,\,\omega,U \} &\Rightarrow \{ \mathcal{N},\,\omega,\,\|\!\!\!\|\!\!\|^{-1}\langle 0,\infty \rangle \cap U \ \} \\ \{ \text{sample},\,\omega,U \} &\Rightarrow \{ \langle \!\!\!/\alpha\_{n+1},\,\omega',U' \rangle \} \qquad \qquad \langle U \subseteq \mathbb{R}^{m}\times\mathbb{S}\_{n} \rangle \\ \{ \text{score}(\mathcal{V}),\,\omega,U \} &\Rightarrow \{ \langle \!\!\!/\mathcal{V},\,\|\!\!\|\!\|\!\|\!\|\!\|\!\|\!\|\!\|\!\|\!\|\!\|\!\|\!\|\!\|\!\|\!\|\!\|$$

In the rule for sample, U := {(*r*, *s*++[s ]) | (*r*, *s*) ∈ U ∧s ∈ (0, 1)} and *w* (*r*, *s*++ [s ]) := *w*(*r*, *s*); in the rule for score(*V* ), (8*V* 8 · *w*)(*r*, *s*) := 8*V* 8 (*r*, *s*) · *w*(*r*, *s*).

The rules are designed to closely mirror their concrete counterparts. Crucially, the rule for sample introduces a "fresh" sampling variable, and the two rules for conditionals split the last component <sup>U</sup> <sup>⊆</sup> **<sup>R</sup>**<sup>m</sup>×**S**<sup>n</sup> according to whether 8*V* 8 (*r*, *s*) ≤ 0 or 8*V* 8 (*r*, *s*) > 0. The "delay" contraction (second rule) is introduced for a technical reason: ultimately, to enable item 1 (Soundness). Otherwise it is, for example, unclear whether λy. α<sup>1</sup> + 1 should correspond to λy. 0.5 + 1 or λy. 1.5 for s<sup>1</sup> = 0.5.

Finally we lift this to arbitrary symbolic terms using the obvious rule for symbolic evaluation contexts:

$$\frac{\langle\mathscr{R},\,\omega,U\rangle\Rightarrow\langle\mathscr{R}',\,\omega',U'\rangle\}}{\langle\mathscr{E}[\mathscr{R}],\,\omega,U\rangle\Rightarrow\langle\mathscr{E}[\mathscr{R}'],\,\omega',U'\rangle\}}$$

Note that we do not need rules corresponding to reductions to fail because the third component of the symbolic configurations "filters out" the pairs (*r*, *s*) corresponding to undefined behaviour. In particular, the following holds:

**Lemma 6.** Suppose ⟪*M* ,*w*, U⟫ is a symbolic configuration and ⟪*M* ,*w*, U⟫ ⇒ ⟪*N* ,*w* , U ⟫. Then ⟪*N* ,*w* , U ⟫ is a symbolic configuration.

A key advantage of the symbolic execution is that the induced computation tree is finitely branching, since branching only arises from conditionals, splitting the trace space into disjoint subsets. This contrasts with the concrete situation (from Sec. 3), in which sampling creates uncountably many branches.

**Lemma 7 (Basic Properties).** Let ⟪*M* ,*w*, U⟫ be a symbolic configuration. Then


Crucially, there is a correspondence between the concrete and symbolic semantics in that they can "simulate" each other:

**Proposition 1 (Correspondence).** Suppose ⟪*M* ,*w*, U⟫ is a symbolic configuration, and (*r*, *s*) ∈ U. Let M ≡ '*M* ( (*r*, *s*) and w := *w*(*r*, *s*). Then

1. If ⟪*M* ,*w*, U⟫ ⇒ ⟪*N* ,*w* , U ⟫ and (*r*, *s* ++ *s-* ) ∈ U then

> M, w, *s* → '*N* ( (*r*, *s* ++ *s-* ),*w*(*r*, *s-* ), *s* ++ *s-* .

2. If M, w, *s*→N,w , *s-* then there exists ⟪*M* ,*w*, U⟫ ⇒ ⟪*N* ,*w* , U ⟫ such that '*N* ( (*r*, *s-* ) ≡ N, *w* (*r*, *s-* ) = w and (*r*, *s-* ) ∈ U .

As a consequence of Lem. 2, we obtain a proof of Thm. 1.

# **6 Densities of Almost Surely Terminating Programs are Differentiable Almost Everywhere**

So far we have seen that the symbolic execution semantics provides a sound and complete way to reason about the weight and value functions. In this section we impose further restrictions on the primitive operations and the terms to obtain results about the differentiability of these functions.

Henceforth we assume Assumption 1 and we fix a term M with free variables amongst x1,...,xm.

From Lem. 3 we immediately obtain the following:

**Lemma 8.** Let ⟪*M* ,*w*, U⟫ be a symbolic configuration such that *w* is differentiable on U ˚ and <sup>μ</sup>(∂U)=0. If ⟪*<sup>M</sup>* ,*w*, U⟫ <sup>⇒</sup> ⟪*<sup>M</sup>* ,*w* , U ⟫ then *w* is differentiable on U ˚ and μ(∂U )=0.

### **6.1 Differentiability on Terminating Traces**

As an immediate consequence of the preceding, Lem. 3 and the Soundness (item 1 of Thm. 1), whenever ⟪M, *<sup>1</sup>*, **<sup>R</sup>**<sup>m</sup>⟫ <sup>⇒</sup><sup>∗</sup> ⟪*<sup>V</sup>* ,*w*, U⟫ then weight<sup>M</sup> and value<sup>M</sup> are differentiable everywhere in U ˚.

Recall the set **<sup>T</sup>**M,term of (*r*, *<sup>s</sup>*) <sup>∈</sup> **<sup>R</sup>**<sup>m</sup>×**<sup>S</sup>** from Eq. (1) for which <sup>M</sup> terminates. We abbreviate **T**M,term to **T**term and define

$$\begin{aligned} \mathbb{T}\_{\mathsf{term}} &:= \mathbb{T}\_{M,\mathsf{term}} = \{ (\mathsf{r}, \mathsf{s}) \in \mathbb{R}^{m} \times \mathbb{S} \mid \exists V, w \text{ . } \langle M[\mathsf{r}/\mathsf{z}], 1, [] \rangle \to^{\mathsf{s}} \langle V, w, \mathsf{s} \rangle \} \\ \mathbb{T}\_{\mathsf{term}}^{\mathsf{int}} &:= \bigcup \{ \dot{U} \mid \exists \mathcal{V}, \omega . \langle M, 1, \mathbb{R}^{m} \rangle \Rightarrow^{\mathsf{s}} \langle \mathcal{V}, \omega , U \rangle \} \end{aligned}$$

By Completeness (item 2 of Thm. 1), **T**term = -{<sup>U</sup> | ∃*<sup>V</sup>* ,*<sup>w</sup>* . ⟪M, *<sup>1</sup>*, **<sup>R</sup>**<sup>m</sup>⟫ <sup>⇒</sup><sup>∗</sup> ⟪*V* ,*w*, U⟫}. Therefore, being countable unions of measurable sets (Lemmas 6 and 7), **T**term and **T**int term are measurable.

By what we have said above, weight<sup>M</sup> and value<sup>M</sup> are differentiable everywhere on **T**int term. Observe that in general, **T**int term **<sup>T</sup>**˚term. However,

$$\mu\left(\mathbb{T}\_{\mathbf{term}} \mid \mathbb{T}\_{\mathbf{term}}^{\mathrm{int}}\right) = \mu\left(\bigcup\_{\substack{U:\{M,\mathfrak{t},\mathbb{R}^{m}\}\ni\simeq\ast}} \left(U \nmid \mathring{U}\right)\right) \le \sum\_{\substack{U:\{M,\mathfrak{t},\mathbb{R}^{m}\}\ni\simeq\ast}} \mu(\partial U) = 0 \quad \text{(2)}$$

The first equation holds because the U-indexed union is of pairwise disjoint sets. The inequality is due to (U \ U ˚) <sup>⊆</sup> ∂U. The last equation above holds because each μ(∂U) = 0 (Assumption 1 and Lem. 8).

Thus we conclude:

**Theorem 2.** Let M be an SPCF term. Then its weight function weight<sup>M</sup> and value function value<sup>M</sup> are differentiable for almost all terminating traces.

### **6.2 Differentiability for Almost Surely Terminating Terms**

Next, we would like to extend this insight for almost surely terminating terms to suitable subsets of **<sup>R</sup>**<sup>m</sup> <sup>×</sup> **<sup>S</sup>**, the union of which constitutes almost the entirety of **<sup>R</sup>**<sup>m</sup>×**S**. Therefore, it is worth examining consequences of almost sure termination (see Def. 1).

We say that (*r*, *<sup>s</sup>*) <sup>∈</sup> **<sup>R</sup>**<sup>m</sup> <sup>×</sup> **<sup>S</sup>** is *maximal* (for <sup>M</sup>) if M[*r*/*x*], <sup>1</sup>, [] →<sup>∗</sup> N, w, *s* and for all *s-* ∈ **S** \ {[]} and N , N, w, *s* →<sup>∗</sup> N , w , *s* ++ *s-* . Intuitively, *s* contains a maximal number of samples to reduce M[*r*/*x*]. Let **T**max be the set of maximal (*r*, *s*).

Note that **T**term ⊆ **T**max and there are terms for which the inclusion is strict (e.g. for the diverging term M ≡ Y(λf. f), [] ∈ **T**max but [] ∈ **T**term). Besides,

Fig. 6: Illustration of how **<sup>R</sup>**<sup>m</sup> <sup>×</sup> **<sup>S</sup>** – visualised as the entire rectangle – is partitioned to prove Thm. 3. The value function returns ⊥ in the red dotted area and a closed value elsewhere (i.e. in the blue shaded area).

**T**max is measurable because, thanks to Prop. 1, for every n ∈ **N**,

$$\{ (\mathfrak{r}, \mathfrak{s}) \in \mathbb{R}^m \times \mathbb{S}\_n \mid \langle M[\underline{\mathfrak{r}}/\mathfrak{x}], 1, [] \rangle \to^\* \langle N, w, \mathfrak{s} \rangle \} = \bigcup\_{\substack{U: \{M, t, \mathbb{R}^m\} \ni \mathfrak{s} \\ \langle \mathfrak{N}, w, U \rangle}} U \cap (\mathbb{R}^m \times \mathbb{S}\_n) \}$$

and the RHS is a countable union of measurable sets (Lemmas 6 and 7).

The following is a consequnce of the definition of almost sure termination and a corollary of Fubini's theorem (see [34] for details):

**Lemma 9.** If M terminates almost surely then μ(**T**max \ **T**term)=0.

Now, observe that for all (*r*, *<sup>s</sup>*) <sup>∈</sup> **<sup>R</sup>**<sup>m</sup> <sup>×</sup> **<sup>S</sup>**, exactly one of the following holds: 1. (*r*, *s*) is maximal


Formally, we say (*r*, *s*) is *stuck* if M[*r*/*x*], 1, [] →<sup>∗</sup> E[sample], w, *s*, and we let **T**stuck be the set of all (*r*, *s*) which get stuck. Thus,

$$
\mathbb{R}^m \times \mathbb{S} = \mathbb{T}\_{\mathsf{max}} \cup \mathbb{T}\_{\mathsf{pref}} \cup \mathbb{T}\_{\mathsf{stack}}
$$

where **T**pref := {(*r*, *s* ++ *s-* ) | (*r*, *s*) ∈ **T**max ∧ *s-*= []}, and the union is disjoint.

Defining **T**int stuck := -{U ˚ <sup>|</sup> ⟪M, *<sup>1</sup>*, **<sup>R</sup>**<sup>m</sup>⟫ <sup>⇒</sup><sup>∗</sup> ⟪*E*[sample],*w*, U⟫} we can argue analogously to Eq. (2) that <sup>μ</sup>(**T**stuck \ **<sup>T</sup>**int stuck) = 0.

Moreover, for **T**int pref := {(*r*, *s* ++ *s-* ) <sup>|</sup> (*r*, *<sup>s</sup>*) <sup>∈</sup> **<sup>T</sup>**int term and [] = *s-*∈ **S**} it holds

$$\mathbb{T}\_{\mathsf{pref}} \mid \mathsf{T}\_{\mathsf{pref}}^{\mathsf{int}} = \bigcup\_{n \in \mathbb{N}} \left\{ (r, s + s') \mid (r, s) \in \mathbb{T}\_{\mathsf{max}} \mid \mathsf{T}\_{\mathsf{term}}^{\mathsf{int}} \wedge s' \in \mathbb{S}\_n \right\}$$

and hence, <sup>μ</sup>(**T**pref \ **<sup>T</sup>**int pref) ≤ <sup>n</sup>∈**<sup>N</sup>** <sup>μ</sup>(**T**max \ **<sup>T</sup>**int term) ≤ 0.

Finally, we define

$$\mathbb{T} := \mathbb{T}^{\mathsf{int}}\_{\mathsf{term}} \cup \mathbb{T}^{\mathsf{int}}\_{\mathsf{pref}} \cup \mathbb{T}^{\mathsf{int}}\_{\mathsf{stuck}}$$

Clearly, this is an open set and the situation is illustrated in Fig. 6. By what we have seen,

$$\mu\left(\left(\mathbb{R}^{m}\times\mathbb{S}\right)\mid\mathbb{T}\right) = \mu(\mathbb{T}\_{\text{term}}\mid\mathbb{T}\_{\text{term}}^{\text{int}}) + \mu(\mathbb{T}\_{\text{pref}}^{\text{int}}\mid\mathbb{T}\_{\text{pref}}) + \mu(\mathbb{T}\_{\text{stack}}\mid\mathbb{T}\_{\text{stack}}^{\text{int}}) = 0$$

Moreover, to conclude the proof of our main result Thm. 3 it suffices to note:


**Theorem 3.** Let M be an SPCF term (possibly with free variables of type R) which terminates almost surely. Then its weight function weight<sup>M</sup> and value function value<sup>M</sup> are differentiable almost everywhere.

We remark that almost sure termination was not used in our development until the proof of Lem. 9. For Thm. 3 we could have instead directly assumed the conclusion of Lem. 9; that is, almost all maximal traces are terminating. This is a strictly weaker condition than almost sure termination. The exposition we give is more appropriate: almost sure termination is a standard notion, and the development of methods to prove almost sure termination is a subject of active research.

We also note that the technique used in this paper to establish almost everywhere differentiability could be used to target another "almost everywhere" property instead: one can simply remove the requirement that elements of F are differentiable, and replace it with the desired property. A basic example of this is smoothness.

# **7 Conclusion**

We have solved an open problem in the theory of probabilistic programming. This is mathematically interesting, and motivated the development of stochastic symbolic execution, a more informative form of operational semantics in this context. The result is also of major practical interest, since almost everywhere differentiability is necessary for correct gradient-based inference.

**Related Work.** This problem was partially addressed in the work of Zhou et al. [55] who prove a restricted form of our theorem for recursion-free first-order programs with analytic primitives. Our stochastic symbolic execution is related to their compilation scheme, which we extend to a more general language.

The idea of considering the possible control paths through a probabilistic programs is fairly natural and not new to this paper; it has been used towards the design of specialised inference algorithms for probabilistic programming, see [11,56]. To our knowledge, this is the first semantic formalisation of the concept, and the first time it is used to reason about whole-program density.

The notions of weight function and value function in this paper are inspired by the more standard trace-based operational semantics of Borgstr¨om et al. [8] (see also [52,31]).

Mazza and Pagani [35] study the correctness of automatic differentiation (AD) of purely deterministic programs. This problem is orthogonal to the work reported here, but it is interesting to combine their result with ours. Specifically, we show a.e. differentiability whilst [35] proves a.s. correctness of AD on the differentiable domain. Combining both results one concludes that for a deterministic program, AD returns a correct gradient a.s. on the entire domain. Going deeper into the comparison, Mazza and Pagani propose a notion of admissible primitive function strikingly similar to ours: given continuity, their condition 2 and our condition 3 are equivalent. On the other hand we require admissible functions to be differentiable, when they are merely continuous in [35]. Finally, we conjecture that "stable points", a central notion in [35], have a clear counterpart within our framework: for a symbolic evaluation path arriving at ⟪*V* , w, U⟫, for *V* a symbolic value, the points of U ˚ are precisely the stable points.

Our work is also connected to recent developments in differentiable programming. Lee et al. [30] study the family of piecewise functions under analytic partition, or just "PAP" functions. PAP functions are a well-behaved family of almost everywhere differentiable functions, which can be used to reason about automatic differentiation in recursion-free first-order programs. An interesting question is whether this can be extended to a more general language, and whether densities of almost surely terminating SPCF programs are PAP functions. (See also [19,9] for work on differentiable programs without conditionals.)

A similar class of functions is also introduced by Bolte and Pauwels [7] in very recent work; this is used to prove a convergence result for stochastic gradient descent in deep learning. Whether this class of functions can be used to reason about probabilistic program densities remains to be explored.

Finally we note that open logical relations [1] are a convenient proof technique for establishing properties of programs which hold at first order, such as almost everywhere differentiability. This approach remains to be investigated in this context, as the connection with probabilistic densities is not immediate.

**Further Directions.** This investigation would benefit from a denotational treatment; this is not currently possible as existing models of probabilistic programming do not account for differentiability.

In another direction, it is likely that we can generalise the main result by extending SPCF with recursive types, as in [51], and, more speculatively, firstclass differential operators as in [17]. It would also be useful to add to SPCF a family of discrete distributions, and more generally continuous-discrete mixtures, which have practical applications [36].

Our work will have interesting implications in the correctness of various gradient-based inference algorithms, such as the recent discontinuous HMC [39] and reparameterisation gradient for non-differentiable models [32]. But given the lack of guarantees of correctness properties available until now, these algorithms have not yet been developed in full generality, leaving many perspectives open.

Acknowledgements. We thank Wonyeol Lee for spotting an error in an example. We gratefully acknowledge support from EPSRC and the Royal Society.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Graded Modal Dependent Type Theory**

Benjamin Moon1(-) , Harley Eades III<sup>2</sup> , and Dominic Orchard<sup>1</sup>

> <sup>1</sup> University of Kent, Canterbury, UK {bgm4,d.a.orchard}@kent.ac.uk <sup>2</sup> Augusta University, Augusta, USA harley.eades@gmail.com

**Abstract.** Graded type theories are an emerging paradigm for augmenting the reasoning power of types with parameterizable, fine-grained analyses of program properties. There have been many such theories in recent years which equip a type theory with quantitative dataflow tracking, usually via a semiring-like structure which provides analysis on variables (often called 'quantitative' or 'coeffect' theories). We present Graded Modal Dependent Type Theory (Grtt for short), which equips a dependent type theory with a general, parameterizable analysis of the flow of data, both in and between computational terms and types. In this theory, it is possible to study, restrict, and reason about data use in programs and types, enabling, for example, parametric quantifiers and linearity to be captured in a dependent setting. We propose Grtt, study its metatheory, and explore various case studies of its use in reasoning about programs and studying other type theories. We have implemented the theory and highlight the interesting details, including showing an application of grading to optimising the type checking procedure itself.

# **1 Introduction**

The difference between simply-typed, polymorphically-typed, and dependentlytyped languages can be characterised by the dataflow permitted by each type theory. In each, dataflow can be enacted by substituting a term for occurrences of a variable in another term, the scope of which is delineated by a binder. In the simply-typed λ-calculus, data can only flow in 'computational' terms; computations and types are separate syntactic categories, with variables, bindings (λ), and substitution—and thus dataflow—only at the computational level. In contrast, polymorphic calculi like System F [26,52] permit dataflow within types, via type quantification (∀), and a limited form of dataflow from computations to types, via type abstraction (Λ) and type application. Dependently-typed calculi (e.g., [14,40,41,42]) break down the barrier between computations and types further: variables are bound simultaneously in types and computations, such that data can flow both to computations and types via dependent functions (Π) and application. This pervasive dataflow enables the Curry-Howard correspondence to be leveraged for program reasoning and theorem proving [59]. However, unrestricted dataflow between computations and types can impede reasoning and can interact poorly with other type theoretic ideas.

c The Author(s) 2021 N. Yoshida (Ed.): ESOP 2021, LNCS 12648, pp. 462–490, 2021. https://doi.org/10.1007/978-3-030-72019-3 17

Firstly, System F allows parametric reasoning and notions of representation independence [53,57], but this is lost in general in dependently-typed languages when quantifying over higher-kinded types [45] (rather than just 'small' types [7,36]). Furthermore, unrestricted dataflow impedes efficient compilation as compilers do not know, from the types alone, where a term is actually needed. Additional static analyses are needed to recover dataflow information for optimisation and reasoning. For example, a term shown to be used only for type checking (not flowing to the computational 'run time' level) can be erased [9]. Thus, dependent theories do not expose the distinction between proof relevant and irrelevant terms, requiring extensions to capture irrelevance [4,50,51]. Whilst unrestricted dataflow between computations and terms has its benefits, the permissive nature of dependent types can hide useful information. This permissiveness also interacts poorly with other type theories which seek to deliberately restrict dataflow, notably linear types.

Linear types allow data to be treated as a 'resource' which must be consumed exactly once: linearly-typed values are restricted to linear dataflow [27,58,60]. Reasoning about resourceful data has been exploited by several languages, e.g., ATS [54], Alms [56], Clean [18], Granule [46], and Linear Haskell [8]. However, linear dataflow is rare in a dependently-typed setting. Consider typing the body of the polymorphic identity function in Martin-L¨of type theory:

$${\{a: \mathsf{Type}, x: a \vdash x: a\}}$$

This judgment uses a twice (typing x in the context and the subject of the judgment) and x once in the term but not at all in the type. There have been various attempts to meaningfully reconcile linear and dependent types [12,15,37,39] usually by keeping them separate, allowing types to depend only on non-linear variables. All such theories cannot distinguish variables used for computation from those used purely for type formation, which could be erased at runtime.

Recent work by McBride [43], refined by Atkey [6], generalises ideas from 'coeffect analyses' (variable usage analyses, like that of Petricek et al. [49]) to a dependently-typed setting to reconcile the ubiquitous flow of data in dependent types with the restricted dataflow of linearity. This approach, called Quantitative Type Theory (Qtt), types the above example as:

$$a \stackrel{0}{\colon} \mathsf{Type}, x \stackrel{1}{\colon} a \vdash x \stackrel{1}{\colon} a$$

The annotation 0 on a explains that we can use a to form a type, but we cannot, or do not, use it at the term level, thus it can be erased at runtime. The cornerstone of Qtt's approach is that dataflow of a term to the type level counts as 0 use, so arbitrary type-level use is allowed whilst still permitting quantitative analysis of computation-level dataflow. Whilst this gives a useful way to relate linear and dependent types, it cannot however reason about dataflow at the typelevel (all type-level usage counts as 0). Thus, for example, Qtt cannot express that a variable is used just computationally but not at all in types.

In an extended abstract, Abel proposes a generalisation of Qtt to track variable use in both types and computations [2], suggesting that tracking in types

enables type checking optimisations and increased expressivity. We develop a core dependent type theory along the same lines, using the paradigm of grading: graded systems augment types with additional information, capturing the structure of programs [23,46]. We therefore name our approach Graded Modal Dependent Type Theory (Grtt for short). Our type theory is parameterised by a semiring which, like other coeffect and quantitative approaches [3,6,10,25,43,49,61], describes dataflow through a program, but in both types and computations equally, remedying Qtt's inability to track type-level use. We extend Abel's initial idea by presenting a rich language, including dependent tensors, a complete metatheory, and a graded modality which aids the practical use of this approach (e.g., enabling functions to use components of data non-uniformly). The result is a calculus which extends the power of existing non-dependent graded languages, like Granule [46], to a dependent setting.

We begin with the definition of Grtt in Section 2, before demonstrating the power of Grtt through case studies in Section 3, where we show how to use grading to restrict Grtt terms to simply-typed reasoning, parametric reasoning (regaining universal quantification smoothly within a dependent theory), existential types, and linear types. The calculus can be instantiated to different kinds of dataflow reasoning: we show an example application to information-flow security. We then show the metatheory of Grtt in Section 4: admissibility of graded structural rules, substitution, type preservation, and strong normalisation.

We implemented a prototype language based on Grtt called **Gerty**. <sup>3</sup> We briefly mention its syntax in Section 2.5 for use in examples. Later, Section 5 describes how the formal definition of Grtt is implemented as a bidirectional type checking algorithm, interfacing with an SMT solver to solve constraints over grades. Furthermore, Abel conjectured that a quantitative dependent theory could enable usage-based optimisation of type-checking itself [2], which would assist dependently-typed programming at scale. We validate this claim in Section 5 showing a grade-directed optimisation to **Gerty**'s type checker.

Section 6 discusses next steps for increasing the expressive power of Grtt. Full proofs and details are provided in the extended version of this paper [44].

**Gerty** has some similarity to Granule [46]: both are functional languages with graded types. However, Granule has a linearly typed core and no dependent types (only indexed types), thus has no need for resource tracking at the type level (type indices are not subject to tracking and their syntax is restricted).

# **2 GrTT: Graded Modal Dependent Type Theory**

Grtt augments a standard presentation of dependent type theory with 'grades' (elements of a semiring) which account for how variables are used, i.e., their dataflow. Whilst existing work uses grades to describe usage only in computational terms (e.g. [10]), Grtt incorporates additional grades to account for how variables are used in types. We introduce here the syntax and typing, and briefly show the syntax of the implementation. Section 4 describes its metatheory.

<sup>3</sup> https://github.com/granule-project/gerty/releases/tag/esop2021

### **2.1 Syntax**

The syntax of Grtt is that of a standard Martin-L¨of type theory, with the addition of a graded modality and grade annotations on function and tensor binders. Throughout, s and r range over grades, which are elements of a semiring (R, ∗, 1, +, 0). It is instructive to instantiate this semiring to the natural number semiring (N, <sup>×</sup>, <sup>1</sup>, <sup>+</sup>, 0), which captures the exact number of times variables are used. We appeal to this example in descriptions here.

Grtt has a single syntactic sort for computations and types:

$$\begin{array}{llll} \left( \textit{terms} \right) & t, A, B, C ::= x & \left| \begin{array}{l} \textsf{Type}\_{l} \\ \left( x :\_{(s,r)} A \right) \to B & \lambda x.t \\ \left( x :\_{r} A \right) \otimes B & \left| \begin{array}{l} (t\_{1}, t\_{2}) & \textsf{let} \ (x, y) = t\_{1} \textsf{in} \, t\_{2} \\ \left( \begin{array}{l} \sqcap\_{s} A \end{array} \right) & \left| \begin{array}{l} \textsf{let} \ (x, y) = t\_{1} \textsf{in} \, t\_{2} \right. \\ \left| \begin{array}{l} \sqcap\_{t} A \end{array} \right. \\ \left| \begin{array}{l} \sqcap\_{s} d \end{array} \right. \end{array} \right. \end{array} \end{array} \right. \\ \left( \begin{array}{l} \textsf{let} \ (x, y) = t\_{1} \textsf{in} \, t\_{2} \, t\_{1} \\ \left| \begin{array}{l} \sqcap\_{s} A \end{array} \right. \end{array} \right) \end{array}$$

Terms include variables and a constructor for an inductive hierarchy of universes, annotated by a level l. Dependent function types are annotated with a pair of grades s and r, with s capturing how x is used in the body of the inhabiting function and r capturing how x is used in the codomain B. Dependent tensors have a single grade r, which describes how the first element is used in the typing of the second. The graded modal type operator <sup>s</sup>A 'packages' a term and its dependencies so that values of type A can be used with grade s in the future. Graded modal types are introduced via promotion t and eliminated via let x = t<sup>1</sup> in t2. The following sections explain the semantics of each piece of syntax with respect to its typing. We typically use A and B to connote terms used as types.

### **2.2 Typing Judgments, Contexts, and Grading**

Typing judgments are written in either of the following two equivalent forms:

$$(\Delta \mid \sigma\_1 \mid \sigma\_2) \odot \Gamma \vdash t : A \qquad \qquad \begin{pmatrix} \Delta \\ \sigma\_1 \\ \sigma\_2 \end{pmatrix} \odot \Gamma \vdash t : A$$

The 'horizontal' syntax (left) is used most often, with the equivalent 'vertical' form (right) used for clarity in some places. Ignoring the part to the left of 9, typing judgments and their rules are essentially those of Martin-L¨of type theory (with the addition of the modality) where Γ ranges over usual dependently-typed typing contexts. The left of 9 provides the grading information, where σ and Δ range over grade vectors and context grade vectors respectively, of the form:

$$\begin{array}{ccc} \text{(contract)} & \text{(grade vectors)} & \text{(context grade vectors)}\\ \varGamma ::= \emptyset \mid \varGamma, x:A & \sigma ::= \emptyset \mid \sigma, s & \qquad \Delta ::= \emptyset \mid \Delta, \sigma \end{array}$$

A grade vector σ is a vector of semiring elements, and a context vector Δ is a vector of grade vectors. We write (s1,...,sn) to denote an n-vector and likewise for context grade vectors. We omit parentheses when this would not cause ambiguity. Throughout, a comma is used to concatenate vectors and disjoint contexts, and to extend vectors with a single grade, grade vector, or typing assumption.

For a judgment (Δ | σ<sup>s</sup> | σr) 9 Γ t : A the vectors Γ, Δ, σs, and σ<sup>r</sup> are all of equal size. Given a typing assumption y : B at index i in Γ, the grade σs[i] ∈ R denotes the use of y in t (the subject of the judgment), the grade <sup>σ</sup>r[i] ∈ R denotes the use of <sup>y</sup> in <sup>A</sup> (the subject's type), and <sup>Δ</sup>[i] ∈ R<sup>i</sup> (of size <sup>i</sup>) describes how assumptions prior to y are used to form y's type, B.

Consider the following example, which types the body of a function that takes two arguments of type a, and returns only the first:

$$\left( \begin{array}{c} ( ), (1), (1, 0) \\ 0, 1, 0 \\ 1, 0, 0 \end{array} \right) \odot a : \mathsf{Type}\_l, x : a, y : a \vdash x : a$$

Let the context grade vector be called Δ. Then, Δ[0] = () (empty vector) explains that there are no assumptions that are used to type a in the context, as Type<sup>l</sup> is a closed term and the first assumption. Δ[1] = (1) explains that the first assumption a is used (grade 1) in the typing of x in the context, and Δ[2] = (1, 0), explains that a is used once in the typing of y in the context, and x is unused in the typing of y. The subject grade vector σ<sup>s</sup> = (0, 1, 0) explains that a is unused in the subject, x is used once, and y is unused. Finally, the subject type vector σ<sup>r</sup> = (1, 0, 0) explains that a appears once in the subject's type (which is just a), and x and y are unused in the formation of the subject's type.

To aid reading, recall that standard typing rules typically have the form context subject : subject-type, the order of which is reflected by (Δ | σ<sup>s</sup> | σr)9... giving the context, subject, and subject-type grading respectively.

Well-formed Contexts The relation Δ9Γ identifies a context Γ as well-formed with respect to context grade vector Δ, defined by the following rules:

$$\begin{array}{c} \begin{array}{c} \begin{array}{c} \text{wr} \varnothing \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} (\Delta \mid \sigma \mid \mathbf{0}) \odot F \vdash A \mathrel{\mathsf{Type}}\_{l} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \mathsf{Type}\_{l} \end{array} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \mathsf{Type}\_{l} \end{array} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \mathsf{Type}\_{l} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \mathsf{Type}\_{l} \end{array} \end{array} \end{array}$$

Unlike typing, well-formedness does not need to include subject and subject-type grade vectors, as it considers only the well-formedness of the assumptions in a context with respect to prior assumptions in the context. The wf<sup>∅</sup> rule states that the empty context is well-formed with an empty context grade vector as there are no assumptions to account for. The wfExt rule states that given A is a type under the assumptions in Γ, with σ accounting for the usage of Γ variables in A, and Δ accounting for usage within Γ, then we can form the wellformed context Γ, x : A by extending Δ with σ to account for the usage of A in forming the context. The notation **0** denotes a vector for which each element is the semiring 0. Note that the well-formedness Δ 9 Γ is inherent from the premise of wfExt due to the following lemma:

**Lemma 1 (Typing contexts are well-formed).** If (Δ | σ<sup>1</sup> | σ2) 9 Γ t : A then Δ 9 Γ .

### **2.3 Typing Rules**

We examine the typing rules of Grtt one at a time.

Variables are introduced as follows:

$$\frac{(\Delta\_1, \sigma, \Delta\_2) \odot \Gamma\_1, x:A, \Gamma\_2 \vdash \quad |\Delta\_1| = |\varGamma\_1|}{(\Delta\_1, \sigma, \Delta\_2 \mid \mathbf{0}^{|\varDelta\_1|}, 1, \mathbf{0} \mid \sigma, 0, \mathbf{0}) \odot \Gamma\_1, x:A, \Gamma\_2 \vdash x:A} \text{ VAR}$$

The premise identifies Γ1, x : A, Γ<sup>2</sup> as well-formed under the context grade vector Δ1, σ, Δ2. By the size condition |Δ1| = |Γ1|, we are able to identify σ as capturing the usage of the variables Γ<sup>1</sup> in forming A. This information is used in the conclusion, capturing type-level variable usage as σ, 0, **0**, which describes that Γ<sup>1</sup> is used according to σ in the subject's type (A), and that the x and the variables of Γ<sup>2</sup> are used with grade 0. For subject usage, we annotate the first zero vector with a size |Δ1|, allowing us to single out x as being the only assumption used with grade 1 in the subject; all other assumptions are used with grade 0.

For example, typing the body of the polymorphic identity ends with Var:

$$\frac{\begin{array}{c} \cdots\\(\text{()},(1)) \odot a: \mathsf{Type}, x: a \vdash \text{ } \begin{array}{c} \text{wFExt} \\ \text{()} \end{array} \mid \text{)} \vert = \vert a: \mathsf{Type} \vert \end{array}}{\begin{array}{c} \vert (\text{()},(1)) \vert \, 0, 1 \mid 1, 0 \rangle \odot a: \mathsf{Type}, x: a \vdash x: a \end{array}} \; \text{VAR}$$

The premise implies that ((), 1, 0) 9 a : Type a : Type by the following lemma:

**Lemma 2 (Typing an assumption in a well-formed context).** If Δ1, σ, Δ<sup>2</sup> 9 Γ1, x : A, Γ<sup>2</sup> with |Δ1| = |Γ1|, then (Δ<sup>1</sup> | σ | *0*) 9 Γ<sup>1</sup> A : Type<sup>l</sup> for some l.

In the conclusion of Var, the typing ((), <sup>1</sup>, 0)9<sup>a</sup> : Type <sup>a</sup> : Type is 'distributed' to the typing of x in the context and to the formation the subject's type. Thus subject grade (0, 1) corresponds to the absence of a from the subject and the presence of x, and subject-type grade (1, 0) corresponds to the presence of a in the subject's type (a), and the absence of x.

Typing universes are formed as follows:

$$\frac{\Delta \odot \Gamma \vdash}{(\Delta \mid \mathbf{0} \mid \mathbf{0}) \odot \Gamma \vdash \mathsf{Type}\_l : \mathsf{Type}\_{\mathsf{suc}\_l}} \mathsf{Type}$$

We use an inductive hierarchy of universes [47] with ordering < such that l < suc l. Universes can be formed under any well-formed context, with every assumption graded with 0 subject and subject-type use, capturing the absence of any assumptions from the universes, which are closed forms.

Functions Function types (x :(s,r) A) → B are annotated with two grades: explaining that x is used with grade s in the body of the inhabiting function and with grade r in B. Function types have the following formation rule:

$$\frac{(\Delta \mid \sigma\_1 \mid \mathbf{0}) \odot \Gamma \vdash A : \mathsf{Type}\_{l\_1} \quad (\Delta, \sigma\_1 \mid \sigma\_2, r \mid \mathbf{0}) \odot \Gamma, x : A \vdash B : \mathsf{Type}\_{l\_2}}{(\Delta \mid \sigma\_1 + \sigma\_2 \mid \mathbf{0}) \odot \Gamma \vdash (x :\_{\left(s, r\right)} A) \to B : \mathsf{Type}\_{l\_1 \sqcup l\_2}} \to \mathsf{0}$$

The usage of the dependencies of A and B (excepting x) are given by σ<sup>1</sup> and σ<sup>2</sup> in the premises (in the 'subject' position) which are combined as σ<sup>1</sup> + σ<sup>2</sup> (via pointwise vector addition using the + of the semiring), which serves to contract the dependencies of the two types. The usage of x in B is captured by r, and then internalised to the binder in the conclusion of the rule. An arbitrary grade for s is allowed here as there is no information on how x is used in an inhabiting function body. Function terms are then typed by the following rule:

$$\frac{(\Delta,\sigma\_1 \mid \sigma\_3,r \mid \mathbf{0}) \odot \Gamma, x:A \vdash B : \mathsf{Type}\_l \quad (\Delta,\sigma\_1 \mid \sigma\_2,s \mid \sigma\_3,r) \odot \Gamma, x:A \vdash t:B}{(\Delta \mid \sigma\_2 \mid \sigma\_1 + \sigma\_3) \odot \Gamma \vdash \lambda x.t:(x:\_{(s,r)} A) \to B} \ \lambda\_i$$

The second premise types the body of the λ-term, showing that s captures the usage of x in t and r captures the usage of x in B; the subject and subject-type grades of x are then internalised as annotations on the function type's binder.

Dependent functions are eliminated through application:

$$\frac{\begin{array}{l}(\Delta,\sigma\_{1}\mid\sigma\_{3},r\mid\mathbf{0})\odot\Gamma,x:A\vdash B\ \mathsf{Type}\_{l}\\(\Delta\mid\sigma\_{2}\mid\sigma\_{1}+\sigma\_{3})\odot\Gamma\vdash t\_{1}:(x\colon\_{(s,r)}A)\to B\quad(\Delta\mid\sigma\_{4}\mid\sigma\_{1})\odot\Gamma\vdash t\_{2}:A\end{array}}{(\Delta\mid\sigma\_{2}+s\*\sigma\_{4}\mid\sigma\_{3}+r\*\sigma\_{4})\odot\Gamma\vdash t\_{1}t\_{2}:[t\_{2}/x]B}\ \lambda\_{e}$$

where ∗ is the scalar multiplication of a vector, using the semiring multiplication. Given a function t<sup>1</sup> which uses its parameter with grade s to compute and with grade r in the typing of the result, we can apply it to a term t2, provided that we have the resources required to form t<sup>2</sup> scaled by s at the subject level and by r at the subject-type level, since t<sup>2</sup> is substituted into the return type B. This scaling behaviour is akin to that used in coeffect calculi [25,49], Qtt [6,43] and Linear Haskell [8], but scalar multiplication happens here at both the subject and subject-type level. The use of variables in A is accounted for by σ<sup>1</sup> as explained in the third premise, but these usages are not present in the resulting application since A no longer appears in the types or the terms.

Consider the constant function λx.λy.x : (x :(1,0) A) → (y :(0,0) B) → A (for some A and B). Here the resources required for the second parameter will always be scaled by 0, which is absorbing, meaning that anything passed as the second argument has 0 subject and subject-type use. This example begins to show some of the power of grading—the grades capture the program structure at all levels.

Tensors The rule for forming dependent tensor types is as follows:

$$\frac{(\Delta \mid \sigma\_1 \mid \mathbf{0}) \odot I \vdash A : \mathsf{Type}\_l \quad (\Delta, \sigma\_1 \mid \sigma\_2, r \mid \mathbf{0}) \odot I, x : A \vdash B : \mathsf{Type}\_l}{(\Delta \mid \sigma\_1 + \sigma\_2 \mid \mathbf{0}) \odot I \vdash (x :\_r A) \otimes B : \mathsf{Type}\_l} \otimes$$

This rule is almost identical to function type formation → but with only a single grade r on the binder, since x is only bound in B (the type of the second component), and not computationally. For 'quantitative' semirings, where 0 really means unused (see Section 3), (x :<sup>0</sup> A) ⊗ B is then a product A × B.

Dependent tensors are introduced as follows:

$$\frac{\begin{array}{l}(\Delta,\sigma\_{1}\mid\sigma\_{3},r\mid\mathbf{0})\odot\Gamma,x:A\vdash B:\mathsf{Type}\_{l}\\(\Delta\mid\sigma\_{2}\mid\sigma\_{1})\odot I\vdash t\_{1}:A\qquad(\Delta\mid\sigma\_{4}\mid\sigma\_{3}+r\*\sigma\_{2})\odot I\vdash t\_{2}:[t\_{1}/x]B\end{array}}{(\Delta\mid\sigma\_{2}+\sigma\_{4}\mid\sigma\_{1}+\sigma\_{3})\odot I\vdash(t\_{1},t\_{2}):(x\mathrel{\mathop{:}}{\mathop{:}{.}{.}{.}{.}{.}{.}{}^{\otimes})\otimes\_{i}}}\otimes\_{i}$$

In the typing premise for t2, occurrences of x are replaced with t<sup>1</sup> in the type, ensuring that the type of the second component (t2) is calculated using the first component (t1). The resources for t<sup>1</sup> in this substitution are scaled by r, accounting for the existing usage of x in B. In the conclusion, we see the resources for the two components (and their types) combined via the semiring addition.

Finally, tensors are eliminated with the following rule:

$$\begin{array}{l} (\Delta \mid \sigma\_3 \mid \sigma\_1 + \sigma\_2) \odot \Gamma \vdash t\_1 : (x \mathrel{\mathop{:}} \, \_t A) \otimes B\\ (\Delta, (\sigma\_1 + \sigma\_2) \mid \sigma\_5, r' \mid \mathbf{0}) \odot \Gamma, z : (x \mathrel{\mathop{:}} \, \_r A) \otimes B \vdash C : \mathsf{Type}\_l\\ (\Delta, \sigma\_1, (\sigma\_2, r) \mid \sigma\_4, s, s \mid \sigma\_5, r', r') \odot \Gamma, x : A, y : B \vdash t\_2 : [(x, y)/z]C\\ \hline (\Delta \mid \sigma\_4 + s \* \sigma\_3 \mid \sigma\_5 + r' \* \sigma\_3) \odot \Gamma \vdash \mathsf{let}\,(x, y) = t\_1 \,\mathsf{in}\, t\_2 : [t\_1/z]C \end{array}$$

As this is a dependent eliminator, we allow the result type C to depend upon the value of the tensor as a whole, bound as z in the second premise with grade r , into which is substituted our actual tensor term t<sup>1</sup> in the conclusion.

Eliminating a tensor (t1) requires that we consider each component (x and y) is used with the same grade s in the resulting expression t2, and that we scale the resources of t<sup>1</sup> by s. This is because we cannot inspect t<sup>1</sup> itself, and semiring addition is not injective (preventing us from splitting the grades required to form t1). This prevents forming certain functions (e.g., projections) under some semirings, but this can be overcome by the introduction of graded modalities.

Graded Modality Graded binders alone do not allow different parts of a value to be used differently, e.g., computing the length of a list ignores the elements, projecting from a pair discards one component. We therefore introduce a graded modality (`a la [10,46]) allowing us to capture the notion of local inspection on data and internalising usage information into types. A type <sup>s</sup>A denotes terms of type A that are used with grade s. Type formation and introduction rules are:

$$\frac{(\Delta \mid \sigma \mid \mathbf{0}) \odot I \vdash A : \mathsf{Type}\_{l}}{(\Delta \mid \sigma \mid \mathbf{0}) \odot I \vdash \Box\_{s} A : \mathsf{Type}\_{l}} \square \quad \frac{(\Delta \mid \sigma\_{1} \mid \sigma\_{2}) \odot I \vdash t : A}{(\Delta \mid s \* \sigma\_{1} \mid \sigma\_{2}) \odot I \vdash \Box t : \Box\_{s} A} \square\_{i}$$

To form a term of type <sup>s</sup>A, we 'promote' a term t of type A by requiring that we can use the resources used to form t (σ1) according to grade s. This 'promotion' resembles that of other graded modal systems (e.g., [3,10,23,46]), but the elimination needs to also account for type usage due to dependent elimination.

We can see promotion <sup>i</sup> as capturing t for later use according to grade s. Thus, when eliminating a term of type <sup>s</sup>A, we must consider how the 'unboxed' term is used with grade s, as per the following dependent eliminator:

$$\frac{\begin{array}{c} (\Delta,\sigma\_{2}\mid\sigma\_{4},r\mid\mathbf{0})\odot\Gamma,z:\square\_{s}A\vdash B\ \mathsf{Type}\_{l} \\ (\Delta\mid\sigma\_{1}\mid\sigma\_{2})\odot I\vdash t\_{1}:\square\_{s}A \quad (\Delta,\sigma\_{2}\mid\sigma\_{3},s\mid\sigma\_{4},(s\ast r))\odot I,x:A\vdash t\_{2}:[\square x/z]B \\ \hline \\ (\Delta\mid\sigma\_{1}+\sigma\_{3}\mid\sigma\_{4}+r\ast\sigma\_{1})\odot I\vdash \mathsf{let}\,\Box x=t\_{1}\ \mathsf{in}t\_{2}:[t\_{1}/z]B \end{array}}{\begin{array}{c} \\ \end{array}\big{array} \square\_{c} $$

This rule can be understood as a kind of 'cut', connecting a 'capability' to use a term of type A according to grade s with the requirement that x : A is used according to grade s as a dependency of t2. Since we are in a dependently-typed setting, we also substitute t<sup>1</sup> into the type level such that B can depend on t<sup>1</sup> according to grade r which then causes the dependencies of t<sup>1</sup> (σ1) to be scaled-up by r and added to the subject-type grading.

Equality, Conversion, and Subtyping A key part of dependent type theories is a notion of term equality and type conversion [33]. Grtt term equality is via judgments (Δ | σ<sup>1</sup> | σ2) 9 Γ t<sup>1</sup> = t<sup>2</sup> : A equating terms t<sup>1</sup> and t<sup>2</sup> of type A. Equality includes full congruences as well as βη-equality for functions, tensors, and graded modalities, of which the latter are:

(Δ, σ<sup>2</sup> <sup>|</sup> <sup>σ</sup>4, r <sup>|</sup> **<sup>0</sup>**) ) Γ, z : <sup>s</sup>A B : Type<sup>l</sup> (<sup>Δ</sup> <sup>|</sup> <sup>σ</sup><sup>1</sup> <sup>|</sup> <sup>σ</sup>2) ) <sup>Γ</sup> <sup>t</sup><sup>1</sup> : <sup>A</sup> (Δ, σ<sup>2</sup> <sup>|</sup> <sup>σ</sup>3, s <sup>|</sup> <sup>σ</sup>4,(<sup>s</sup> <sup>∗</sup> <sup>r</sup>)) ) Γ, x : <sup>A</sup> <sup>t</sup><sup>2</sup> : [x/z]B (<sup>Δ</sup> <sup>|</sup> <sup>σ</sup><sup>3</sup> <sup>+</sup> <sup>s</sup> <sup>∗</sup> <sup>σ</sup><sup>1</sup> <sup>|</sup> <sup>σ</sup><sup>4</sup> <sup>+</sup> <sup>s</sup> <sup>∗</sup> <sup>r</sup> <sup>∗</sup> <sup>σ</sup>1) ) <sup>Γ</sup> (let x = t<sup>1</sup> in t2)=[t1/x]t<sup>2</sup> : [<sup>t</sup>1/z]<sup>B</sup> Eq<sup>c</sup> (<sup>Δ</sup> <sup>|</sup> <sup>σ</sup><sup>1</sup> <sup>|</sup> <sup>σ</sup>2) ) <sup>Γ</sup> <sup>t</sup> : <sup>s</sup>A (<sup>Δ</sup> <sup>|</sup> <sup>σ</sup><sup>1</sup> <sup>|</sup> <sup>σ</sup>2) ) <sup>Γ</sup> <sup>t</sup> = (let x = tin x) : <sup>s</sup><sup>A</sup> Eq<sup>u</sup>

A subtyping relation ((Δ | σ) 9 Γ A ≤ B) subsumes equality, adding ordering of universe levels. Type conversion allows re-typing terms based on the judgment:

$$\frac{(\Delta \mid \sigma\_1 \mid \sigma\_2) \odot \Gamma \vdash t : A \quad (\Delta \mid \sigma\_2) \odot \Gamma \vdash A \leq B}{(\Delta \mid \sigma\_1 \mid \sigma\_2) \odot \Gamma \vdash t : B} \text{Conv}$$

The full rules for equality and subtyping are in this paper's extended version [44].

# **2.4 Operational Semantics**

As with other graded modal calculi (e.g., [3,10,23]), the core calculus of Grtt has a Call-by-Name small-step operational semantics with reductions t t . The rules are standard, with the addition of the β-rule for the graded modality:

$$\mathsf{let}\,\Box x = \Box t\_1 \,\mathsf{in}\,t\_2 \leadsto [t\_1/x]t\_2 \qquad \text{(}\beta\Box\text{)}$$

Type preservation and normalisation are considered in Section 4.

### **2.5 Implementation and Examples**

To explore our theory, we provide an implementation, **Gerty**. Section 5 describes how the declarative definition of the type theory is implemented as a bidirectional type checking algorithm. We briefly mention the syntax here for use in later examples. The following is the polymorphic identity function in **Gerty**:

id : (a : (.0, .2) Type 0) -> (x : (.1, .0) a) -> a id = \a -> \x -> x

The syntax resembles the theory, where grading terms .n are syntactic sugar for a unary encoding of grades in terms of 0 and repeated addition of 1, e.g., .2 = (.0 + .1)+.1. This syntax can be used for grade terms of any semiring, which can be resolved to particular built-in semirings at other points of type checking.

The following shows first projection on (non-dependent) pairs, using the graded modality (at grade 0 here) to give fine-grained usage on compound data: fst : (a : (.0, .2) Type 0) (b : (.0, .1) Type 0) -> <a \* [.0] b> -> a fst = \a b p -> case p of <x, y> -> let [z] = y in x

The implementation adds various built-in semirings, some syntactic sugar, and extras such as: a singleton unit type, extensions of the theory to semirings with a pre-ordering (discussed further in Section 6), and some implicit resolution. Anywhere a grade is expected, an underscore can be supplied to indicate that **Gerty** should try to resolve the grade implicitly. Grades may also be omitted from binders (see above in fst), in which case they are treated as implicits. Currently, implicits are handled by generating existentially quantified grade variables, and using SMT to solve the necessary constraints (see Section 5).

So far we have considered the natural numbers semiring providing an analysis of usage. We come back to this and similar examples in Section 3. To show another kind of example, we consider a lattice semiring of privacy levels (appearing elsewhere [3,23,46]) which enforces information-flow control, akin to DCC [1]. Differently to DCC, dataflow is tracked through variable dependencies, rather than through the results of computations in the monadic style of DCC.

**Definition 1.** [Security levels] Let R = Lo ≤ Hi be a set of labels with 0 = Hi and 1 = Lo, semiring addition as the meet and multiplication as join. Here, 1 = Lo treats the base notion of dataflow as being in the low security (public) domain. Variables graded with Hi must then be unused, or guarded by a graded modality. This semiring is primitive in **Gerty**; we can express the following example:

idLo : (a : (.0, .2) Type 0) -> (x : (Lo, Hi) a) -> a idLo = \a -> \x -> x -- The following is rejected as ill-typed leak : (a : (.0, .2) Type 0) -> (x : (Hi, Hi) a) -> a leak = \a -> \x -> idLo a x

The first definition is well-typed, but the second yields a typing error originating from the application in its body:

```
At subject stage got the following mismatched grades:
 For 'x' expected Hi but got .1
```
where grade 1 is Lo here. Thus we can use this abstract label semiring as a way of restricting flow of data between regions (cf. region typing systems [31,55]). Note that the ordering is not leveraged here other than in the lattice operations.

# **3 Case Studies**

We now demonstrate Grtt via several cases studies that focus the reasoning power of dependent types via grading. Since grading in Grtt serves to explain dataflow, we can characterise subsets of Grtt that correspond to various type theories. We demonstrate the approach with simple types, parametric polymorphism, and linearity. In each case study, we restrict Grtt to a subset by a characterisation of the grades, rather than by, say, placing detailed syntactic restrictions or employing meta-level operations or predicates that restrict syntax (as one might do for example to map a subset of Martin-L¨of type theory into the simply-typed λ-calculus by restriction to closed types, requiring deep inspection of type terms). Since this restriction is only on grades, we can harness the specific reasoning power of particular calculi from within the language itself, simply by specifications on grades. In the context of an implementation like **Gerty**, this amounts to using type signatures to restrict dataflow.

This section shows the power of tracking dataflow in types via grades, going beyond Qtt [6] and GraD [13]. For 'quantitative' semirings, a 0 type-grade means that we can recover simply-typed reasoning (Section 3.3) and distinguish computational functions from type-parameter functions for parametric reasoning (Section 3.4), embedding a grade-restricted subset of Grtt into System F.

Section 5 returns to a case study that builds on the implementation.

### **3.1 Recovering Martin-L¨of Type Theory**

When the semiring parameterising Grtt is the singleton semiring (i.e., any semiring where 1 = 0), we have an isomorphism <sup>r</sup>A ∼= A, and grade annotations become redundant, as all grades are equal. All vectors and grades on binders may then be omitted, and we can write typing judgments as Γ t : A, giving rise to a standard Martin-L¨of type theory as a special case of Grtt.

### **3.2 Determining Usage via Quantitative Semirings**

Unlike existing systems, we can use the fine-grained grading to guarantee the relevance or irrelevance of assumptions in types. To do this we must consider a subset of semirings (R, ∗, 1, +, 0) called quantitative semirings, satisfying:

(zero-unique) 1 = 0; (positivity) ∀r, s. r + s =0 =⇒ r = 0 ∧ s = 0; (zero-product) ∀r, s. r ∗ s =0 =⇒ r = 0 ∨ s = 0.

These axioms<sup>4</sup> ensure that a 0-grade in a quantitative semiring represents irrelevant variable use. This notion has recently been proved for computational use by Choudhury et al. [13] via a heap-based semantics for grading (on computations) and the same result applies here. Conversely, in a quantitative semiring any grade other than 0 denotes relevance. From this, we can directly encode non-dependent tensors and arrows: in (x :<sup>0</sup> A) ⊗ B the grade 0 captures that x cannot have any computational content in B, and likewise for (x :(s,0) A) → B the grade 0 explains that x cannot have any computational content in B, but may have computational use according to s in the inhabiting function. Thus,

<sup>4</sup> Atkey requires positivity and zero-product for all semirings parameterising Qtt [6] (as does Abel [2]). Atkey imposes this for admissibility of substitution. We need not place this restriction on Grtt to have substitution in general (Sec. 4.1).

the grade 0 here describes that elimination forms cannot ever inspect the variable during normalisation. Additionally, quantitative semirings can be used for encoding simply-typed and polymorphic reasoning.

Example 1. Some quantitative semirings are:


### **3.3 Simply-typed Reasoning**

As discussed in Section 1, the simply-typed λ-calculus (STLC) can be distinguished from dependently-typed calculi via the restriction of dataflow: in simple types, data can only flow at the computational level, with no dataflow within, into, or from types. We can thus view a Grtt function as simply typed when its variable is irrelevant in the type, e.g., (x :(s,0) A) → B for quantitative semirings. We define a subset of Grtt restricted to simply-typed reasoning:

**Definition 2.** [Simply-typed Grtt] For a quantitative semiring, the following predicate Stlc(−) determines a subset of simply-typed Grtt programs:

$$\begin{aligned} \operatorname{Str\,lc}((\emptyset \mid \emptyset \mid \emptyset) \odot \emptyset \vdash t : A) \\ \operatorname{Str\,lc}((\Delta \mid \sigma\_1 \mid \sigma\_2) \odot \Gamma \vdash t : A) &\Longrightarrow \operatorname{Str\,lc}((\Delta, \mathbf{0} \mid \sigma\_1, s \mid \sigma\_2, 0) \odot \Gamma, x : B \vdash t : A) \end{aligned}$$

That is, all subject-type grades are 0 (thus function types are of the form (x :(s,0) A) → B). A similar predicate is defined on well-formed contexts (elided), restricting context grades of well-formed contexts to only zero grading vectors.

Under the restriction of Definition 2, a subset of Grtt terms embeds into the simply-typed λ-calculus in a sound and complete way. Since STLC does not have a notion of tensor or modality, this is omitted from the encoding:

<sup>x</sup> <sup>=</sup> <sup>x</sup> λx.t <sup>=</sup> λx.<sup>t</sup> <sup>t</sup><sup>1</sup> <sup>t</sup><sup>2</sup> <sup>=</sup> t1<sup>t</sup><sup>2</sup> -(<sup>x</sup> :(s,0) <sup>A</sup>) <sup>→</sup> <sup>B</sup><sup>τ</sup> <sup>=</sup>-<sup>A</sup><sup>τ</sup> <sup>→</sup> -Bτ

Variable contexts of Grtt are interpreted by point-wise applying -<sup>−</sup><sup>τ</sup> to typing assumptions. We then get the following preservation of typing into the simplytyped λ-calculus, and soundness and completeness of this encoding:

**Lemma 3 (Soundness of typing).** Given a derivation of (Δ | σ<sup>1</sup> | σ2) 9 Γ <sup>t</sup> : <sup>A</sup> such that Stlc((<sup>Δ</sup> <sup>|</sup> <sup>σ</sup><sup>1</sup> <sup>|</sup> <sup>σ</sup>2) <sup>9</sup> <sup>Γ</sup> <sup>t</sup> : <sup>A</sup>) then -<sup>Γ</sup><sup>τ</sup> <sup>t</sup> : -<sup>A</sup><sup>τ</sup> in STLC.

**Theorem 1 (Soundness and completeness of the embedding).** Given Stlc((<sup>Δ</sup> <sup>|</sup> <sup>σ</sup><sup>1</sup> <sup>|</sup> <sup>σ</sup>2) <sup>9</sup> <sup>Γ</sup> <sup>t</sup> : <sup>A</sup>) and -(<sup>Δ</sup> <sup>|</sup> <sup>σ</sup><sup>1</sup> <sup>|</sup> <sup>σ</sup>2) <sup>9</sup> <sup>Γ</sup> <sup>t</sup> : <sup>A</sup> then for CBN reduction stlc in simply-typed λ-calculus:

$$\begin{array}{l}(soundness)\ \forall t'.\ if\ t\leadsto t'\ & then\ \begin{bmatrix} t \end{bmatrix}\leadsto^{\text{STLC}}\ \begin{bmatrix} t' \end{bmatrix}\\(completeness)\ \forall t\_a.\ if\ \begin{bmatrix} t \end{bmatrix}\leadsto^{\text{STLC}}\ t\_a\ then\ \exists t'.\ t\leadsto^{t'}\ \land\ \begin{bmatrix} t' \end{bmatrix}\equiv\_{\beta\eta}t\_a\end{array}$$

Thus, we capture simply-typed reasoning just by restricting type grades to 0 for quantitative semirings. We consider quantitative semirings again for parametric reasoning, but first recall issues with parametricity and dependent types.

### **3.4 Recovering Parametricity via Grading**

One powerful feature of grading in a dependent type setting is the ability to recover parametricity from dependent function types. Consider the following type of functions in System F (we borrow this example from Nuyts et al. [45]):

$$\mathbb{R}\mathbb{I}\ A\ B \stackrel{\Delta}{=} \forall \gamma. (\gamma \to A) \to (\gamma \to B)$$

Due to parametricity, we get the following notion of representation independence in System F: for a function f : RI A B, some type γ , and terms h : γ → A and c : γ , then we know that f can only use c by applying h c. Subsequently, RI A B ∼= A → B by parametricity [52], defined uniquely as:

$$\begin{array}{ll} iso: \mathsf{Rl}\ A\ B \to (A \to B) & iso^{-1}: (A \to B) \to \mathsf{Rl}\ A\ B\\ iso\ f = f\ A\ (\operatorname{id}\ A) & iso^{-1}\ g = \Lambda\gamma.\ \lambda h.\ \lambda(c:\gamma).\ g(h.c) \end{array}$$

In a dependently-typed language, one might seek to replace System F's universal quantifier with Π-types, i.e.

$$\mathsf{Rl}'A\ B \triangleq (\gamma : \mathsf{Type}) \to (\gamma \to A) \to (\gamma \to B)$$

However, we can no longer reason parametrically about the inhabitants of such types (we cannot prove that RI A B ∼= A → B) as the free interaction of types and computational terms allows us to give the following non-parametric element of RI A B over 'large' type instances:

$$leak = \lambda \gamma. \; \lambda h. \; \lambda c. \; \gamma: \mathsf{Rl}' \; A \; \mathsf{Type}$$

Instead of applying h c, the above "leaks" the type parameter γ. Grtt can recover universal quantification, and hence parametric reasoning, by using grading to restrict the data-flow capabilities of a Π-type. We can refine representation independence to the following:

$$\mathsf{R}\mathsf{I}\prime\prime\ A\ B\ \triangleq\ (\gamma\colon\_{(0,2)}\mathsf{Type}) \to (h\colon\_{(s\_1,0)}\left(x\colon\_{(s\_2,0)}\gamma\right) \to A) \to (c\colon\_{(s\_3,0)}\gamma) \to B$$

for some grades s1, s2, and s3, and with shorthand 2 = 1 + 1.

If we look at the definition of leak above, we see that γ is used in the body of the function and thus requires usage 1, so leak cannot inhabit RI A Type. Instead, leak would be typed differently as:

$$\text{leak} \colon (\gamma :\_{(1,2)} \mathsf{Type}) \to (h :\_{(0,0)} (x :\_{(s,0)} \gamma) \to A) \to (c :\_{(0,0)} \gamma) \to \mathsf{Type}$$

The problematic behaviour (that the type parameter γ is returned by the inner function) is exposed by the subject grade 1 on the binder of γ. We can thus define a graded universal quantification from a graded Π-typed:

$$\forall\_r (\gamma:A).B \stackrel{\Delta}{=} (\gamma:\_{(0,r)}A) \to B \tag{1}$$

This denotes that the type parameter γ can appear freely in B described by grade r, but is irrelevant in the body of any corresponding λ-abstraction. This is akin to the work of Nuyts et al. who develop a system with several modalities for regaining parametricity within a dependent type theory [45]. Note however that parametricity is recovered for us here as one of many possible options coming from systematically specialising the grading.

Capturing Existential Types With the ability to capture universal quantifier, we can similarly define existentials (allowing, e.g., abstraction [11]). We define the existential type via a Church-encoding as follows:

$$(\exists\_r (x:A).B \triangleq \forall\_2 (C:\mathsf{Type}\_l).(f:\_{(1,0)} \forall\_r (x:A).(b:\_{(s,0)} B) \to C) \to C$$

Embedding into Stratified System F We show that parametricity is regained here (and thus eqn. (1) really behaves as a universal quantifier and not a general Πtype) by showing that we can embed a subset of Grtt into System F, based solely on a classification of the grades. We follow a similar approach to Section 3.3 for simply-typed reasoning but rather than defining a purely syntactic encoding (and then proving it type sound) our encoding is type directed since we embed Grtt functions of type (<sup>x</sup> :(0,r) Typel) <sup>→</sup> <sup>B</sup> as universal types in System F with corresponding type abstractions (Λ) as their inhabitants. Since Grtt employs a predicative hierarchy of universes, we target Stratified System F (hereafter SSF) since it includes the analogous inductive hierarchy of kinds [38]. We use the formulation of Eades and Stump [21] with terms t<sup>s</sup> and types T:

$$t\_s ::= x \mid \lambda(x:T).t\_s \mid t\_s \; t\_s' \mid \Lambda(X:K).t\_s \mid t\_s \; [T] \qquad T::=X \mid T \to T' \mid \forall (X:K).T$$

with kinds <sup>K</sup> ::= <sup>l</sup> where <sup>l</sup> <sup>∈</sup> <sup>N</sup> providing the stratified kind hierarchy. Capitalised variables X are System F type variables and t<sup>s</sup> [T] is type application. Contexts may contain both type and computational variables, and so freevariable type assumptions may have dependencies, akin to dependent type systems. Kinding is via judgments Γ T : <sup>l</sup> and typing via Γ t : T.

We define a type directed encoding on a subset of Grtt typing derivations characterised by the following predicate:

$$\begin{aligned} \operatorname{Ssr}((\emptyset \mid \emptyset \mid \emptyset) \odot \emptyset \vdash t : A) \\ \operatorname{Ssr}((\bot \mid \sigma\_1 \mid \sigma\_2) \odot I \vdash t : A) &\implies \operatorname{Ssr}((\bot \mathbf{0} \mid \sigma\_1, 0 \mid \sigma\_2, r) \odot I, x : \mathsf{Type}\_l \vdash t : A) \\ \operatorname{Ssr}((\bot \mid \sigma\_1 \mid \sigma\_2) \odot I \vdash t : A) \land \operatorname{Tppe}\_l \not\in ^{+ve} B \\ &\implies \operatorname{Ssr}((\bot \Delta, \sigma\_3 \mid \sigma\_1, s \mid \sigma\_2, 0) \odot I, x : B \vdash t : A) \end{aligned}$$

By Type<sup>l</sup> ∈<sup>+</sup>ve <sup>B</sup> we mean Type<sup>l</sup> is not a positive subterm of <sup>B</sup>, avoiding higherorder typing terms (e.g., type constructors) which do not exist in SSF.

Under this restriction, we give a type-directed encoding mapping derivations of Grtt to SSF: given a Grtt derivation of judgment (<sup>Δ</sup> <sup>|</sup> <sup>σ</sup><sup>1</sup> <sup>|</sup> <sup>σ</sup>2) <sup>9</sup> <sup>Γ</sup> <sup>t</sup> : <sup>A</sup> we have that ∃t<sup>s</sup> (an SSF term) such that there is a derivation of judgment -<sup>Γ</sup> <sup>t</sup><sup>s</sup> : -<sup>A</sup><sup>τ</sup> in SSF where we interpret a subset of Grtt terms <sup>A</sup> as types:

<sup>x</sup><sup>τ</sup> <sup>=</sup> <sup>x</sup> -Type<sup>l</sup><sup>τ</sup> <sup>=</sup> <sup>l</sup> -(<sup>x</sup> :(0,r) Typel) <sup>→</sup> <sup>B</sup><sup>τ</sup> <sup>=</sup> <sup>∀</sup><sup>x</sup> : l.-<sup>B</sup><sup>τ</sup> where Type<sup>l</sup> ∈<sup>+</sup>ve <sup>B</sup> -(<sup>x</sup> :(s,0) <sup>A</sup>) <sup>→</sup> <sup>B</sup><sup>τ</sup> <sup>=</sup> -<sup>A</sup><sup>τ</sup> <sup>→</sup> -<sup>B</sup><sup>τ</sup> where Type<sup>l</sup> ∈<sup>+</sup>ve A, B

Thus, dependent functions with Type parameters that are computationally irrelevant (subject grade 0) map to ∀ types, and dependent functions with parameters irrelevant in types (subject-type grade 0) map to regular function types.

We elide the full details but sketch key parts where functions and applications are translated inductively (where Ty<sup>l</sup> is shorthand for Typel):

 (Δ, σ<sup>1</sup> <sup>|</sup> <sup>σ</sup>2, <sup>0</sup> <sup>|</sup> <sup>σ</sup>3, r) ) Γ, x : Ty<sup>l</sup> <sup>t</sup> : <sup>B</sup> (<sup>Δ</sup> <sup>|</sup> <sup>σ</sup><sup>2</sup> <sup>|</sup> <sup>σ</sup><sup>1</sup> <sup>+</sup> <sup>σ</sup>3) ) <sup>Γ</sup> λx.t : (<sup>x</sup> :(0,r) Tyl) <sup>→</sup> <sup>B</sup> <sup>=</sup> Γ, x : <sup>l</sup> <sup>t</sup><sup>s</sup> : B<sup>τ</sup> Γ <sup>Λ</sup>(<sup>x</sup> : l).t<sup>s</sup> : <sup>∀</sup><sup>x</sup> : l.B<sup>τ</sup> (Δ, σ<sup>1</sup> <sup>|</sup> <sup>σ</sup>2, s <sup>|</sup> <sup>σ</sup>3, 0) ) Γ, x : <sup>A</sup> <sup>t</sup> : <sup>B</sup> (<sup>Δ</sup> <sup>|</sup> <sup>σ</sup><sup>2</sup> <sup>|</sup> <sup>σ</sup><sup>1</sup> <sup>+</sup> <sup>σ</sup>3) ) <sup>Γ</sup> λx.t : (<sup>x</sup> :(s,0) <sup>A</sup>) <sup>→</sup> <sup>B</sup> <sup>=</sup> Γ, x : A<sup>τ</sup> <sup>t</sup><sup>s</sup> : B<sup>τ</sup> Γ <sup>λ</sup>(<sup>x</sup> : A<sup>τ</sup> ).t<sup>s</sup> : A<sup>τ</sup> <sup>→</sup> B<sup>τ</sup> (Δ | σ<sup>2</sup> | σ<sup>1</sup> + σ3) ) Γ t<sup>1</sup> : (x :(0,r) Tyl) → B (Δ | σ<sup>4</sup> | σ1) ) Γ t<sup>2</sup> : Ty<sup>l</sup> (<sup>Δ</sup> <sup>|</sup> <sup>σ</sup><sup>2</sup> <sup>|</sup> <sup>σ</sup><sup>3</sup> <sup>+</sup> <sup>r</sup> <sup>∗</sup> <sup>σ</sup>4) ) <sup>Γ</sup> <sup>t</sup><sup>1</sup> <sup>t</sup><sup>2</sup> : [t2/x]<sup>B</sup> <sup>=</sup> Γ <sup>t</sup><sup>s</sup> : <sup>∀</sup>(<sup>x</sup> : l).B<sup>τ</sup> Γ <sup>T</sup> : <sup>l</sup> Γ <sup>t</sup>s[T]:[T /x]B<sup>τ</sup> (Δ | σ<sup>2</sup> | σ<sup>1</sup> + σ3) ) Γ t<sup>1</sup> : (x :(s,0) A) → B (Δ | σ<sup>4</sup> | σ1) ) Γ t<sup>2</sup> : A (<sup>Δ</sup> <sup>|</sup> <sup>σ</sup><sup>2</sup> <sup>+</sup> <sup>s</sup> <sup>∗</sup> <sup>σ</sup><sup>4</sup> <sup>|</sup> <sup>σ</sup>3) ) <sup>Γ</sup> <sup>t</sup><sup>1</sup> <sup>t</sup><sup>2</sup> : [t2/x]<sup>B</sup> <sup>=</sup> Γ <sup>t</sup><sup>s</sup> : A<sup>τ</sup> <sup>→</sup> B<sup>τ</sup> Γ <sup>t</sup> - <sup>s</sup> : A<sup>τ</sup> Γ <sup>t</sup><sup>s</sup> <sup>t</sup>- <sup>s</sup> : [t- <sup>s</sup>/x]B<sup>τ</sup>

In the last case, note the presence of [t <sup>s</sup>/x]-<sup>B</sup><sup>τ</sup> . Reasoning under the context of the encoding, this is proven equivalent to -<sup>B</sup><sup>τ</sup> since the subject type grade is 0 and therefore use of x in B is irrelevant.

**Theorem 2 (Soundness and completeness of SSF embedding).** Given Ssf((<sup>Δ</sup> <sup>|</sup> <sup>σ</sup><sup>1</sup> <sup>|</sup> <sup>σ</sup>2) <sup>9</sup> <sup>Γ</sup> <sup>t</sup> : <sup>A</sup>) and <sup>t</sup><sup>a</sup> in SSF where -(<sup>Δ</sup> <sup>|</sup> <sup>σ</sup><sup>1</sup> <sup>|</sup> <sup>σ</sup>2) <sup>9</sup> <sup>Γ</sup> <sup>t</sup> : <sup>A</sup> <sup>=</sup> -<sup>Γ</sup> <sup>t</sup><sup>s</sup> : -<sup>A</sup><sup>τ</sup> then for CBN reduction Ssf in Stratified System F:

$$\begin{array}{c} (soundness)\ \forall t'.\ t\leadsto t'\Longrightarrow\exists t'\_s.t\_s\leadsto^{\text{SSF}}t'\_s\\ \land \ [(\Delta\mid\sigma\_1\mid\sigma\_2)\odot I\vdash t':A] = \lbrack I\rbrack\vdash t'\_s:\lbrack A\rbrack\_\tau\\ (completeness)\ \forall t'\_s.\ t\_s\leadsto^{\text{SSF}}t'\_s \Longrightarrow\exists t'.t\leadsto^{\text{I}}\\ \land \ [(\Delta\mid\sigma\_1\mid\sigma\_2)\odot I\vdash t':A] = \lbrack I\rbrack\vdash t'\_s:\lbrack A\rbrack\_\tau\end{array}$$

Thus, we can capture parametricity in Grtt via the judicious use of 0 grading (at either the type or computational level) for quantitative semirings. This embedding is not possible from Qtt since Qtt variables graded with 0 may be used arbitrarily in the types; the embedding here relies on Grtt's 0 type-grade capturing abscence in types for quantitative semirings.

### **3.5 Graded Modal Types and Non-dependent Linear Types**

Grtt can embed the reasoning present in other graded modal type theories (which often have a linear base), for example the explicit semiring-graded necessity modality found in coeffect calculi [10,23] and Granule [46]. We can recover the axioms of a graded necessity modality (usually modelled by an exponential graded comonad [23]). For example, in **Gerty** the following are well typed:

```
counit : (a : (.0, .2) Type) -> (z : (.1 , .0) [.1] a) -> a
counit = \a z -> case z of [y] -> y
comult : (a : (.0, .2) Type) -> (z : (.1 , .0) [.6] a) -> [.2] ([.3] a)
comult = \a z -> case z of [y] -> [[y]]
```
corresponding to ε : -<sup>1</sup><sup>A</sup> <sup>→</sup> <sup>A</sup> and <sup>δ</sup>r,s : <sup>r</sup>∗<sup>s</sup><sup>A</sup> <sup>→</sup> r(<sup>s</sup>A): operations of graded necessity / graded comonads. Since we cannot use arbitrary terms for grades in the implementation, we have picked some particular grades here for comult. First-class grading is future work, discussed in Section 6.

Linear functions can be captured as <sup>A</sup> <sup>B</sup> (<sup>x</sup> :(1,r) <sup>A</sup>) <sup>→</sup> <sup>B</sup> for an exact usage semiring. It is straightforward to characterise a subset of Grtt programs that maps to the linear λ-calculus akin to the encodings above. Thus, Grtt provides a suitable basis for studying both linear and non-linear theories alike.

# **4 Metatheory**

We now study Grtt's metatheory. We first explain how substitution presents itself in the theory, and how type preservation follows from a relationship between equality and reduction. We then show admissibility of graded structural rules for contraction, exchange, and weakening, and strong normalization.

### **4.1 Substitution**

We introducing substitution for well-formed contexts and then typing.

### **Lemma 4 (Substitution for well-formed contexts).** If the following hold:

1. (Δ | σ<sup>2</sup> | σ1) 9 Γ<sup>1</sup> t : A and 2. (Δ, σ1, Δ ) 9 Γ1, x : A, Γ<sup>2</sup> Then: Δ,(Δ \ |Δ| + (Δ / |Δ|) ∗ σ2) 9 Γ1, [t/x]Γ<sup>2</sup>

That is, given Γ1, x : A, Γ<sup>2</sup> is well-formed, we can cut out x by substituting t for x in Γ2, accounting for the new usage in the context grade vectors. The usage of Γ<sup>1</sup> in t is given by σ2, and the usage in A by σ1. When substituting, Δ remains the same, as Γ<sup>1</sup> is unchanged. However, to account for the usage in [t/x]Γ2, we have to form a new context grade vector Δ \ |Δ| + (Δ / |Δ|) ∗ σ2.

The operation Δ \ |Δ| (pronounced 'discard') removes grades corresponding to x, by removing the grade at index |Δ| from each grade vector in Δ . Everything previously used in the typing of x in the context must now be distributed across [t/x]Γ2, which is done by adding on (Δ / |Δ|) ∗ σ2, which uses Δ / |Δ| (pronounced 'choose') to produce a vector of grades, which correspond to the grades cut out in Δ \ |Δ|. The multiplication of (Δ / |Δ|)∗σ<sup>2</sup> produces a context grade vector by scaling σ<sup>2</sup> by each element of (Δ / |Δ|). When adding vectors, if the sizes of the vectors are different, then the shorter vector is right-padded with zeroes. Thus Δ \ |Δ| + (Δ / |Δ|) ∗ σ<sup>2</sup> can be read as 'Δ without the grades corresponding to x, plus the usage of t scaled by the prior usage of x'.

For example, given typing ((),(1) | 0, 1 | 1, 0) 9 a : Type, y : a y : a and well-formed context ((),(1),(1, 0),(0, 0, 2)) 9 a : Type, y : a, x : a, z : t , where t uses x twice, we can substitute y for x. Therefore, let Γ<sup>1</sup> = a : Type, y : a thus |Γ1| = 2 and Γ<sup>2</sup> = z : x and Δ = ((0, 0, 2)) and σ<sup>1</sup> = 1, 0 and σ<sup>2</sup> = 0, 1. Then the context grade of the substitution [y/x]Γ<sup>2</sup> is calculated as:

((0, 0, 2))\ |Γ1| = ((0, 0)) (((0, 1, 2))/ |Γ1|) ∗ σ<sup>2</sup> = (2) ∗ (0, 1) = ((0, 2))

Thus the resulting judgment is ((),(1),(0, 2)) 9 a : Type, y : a, z : [y/x]t .

**Lemma 5 (Substitution for typing).** If the following premises hold:

$$\begin{array}{l} 1.\ (\Delta\mid\sigma\_{2}\mid\sigma\_{1})\odot\varGamma\_{1}\vdash t:A\\ 2.\ (\Delta,\sigma\_{1},\Delta'\mid\sigma\_{3},s,\sigma\_{4}\mid\sigma\_{5},r,\sigma\_{6})\odot\varGamma\_{1},x:A,\varGamma\_{2}\vdash t':B\\ 3.\ |\sigma\_{3}\mid-|\sigma\_{5}|=|\varGamma\_{1}|\\ \end{array}$$

$$Then\ \begin{pmatrix}\Delta,(\Delta'\backslash\mid\Delta\mid+(\Delta'\backslash\mid\Delta\mid)\*\sigma\_{2})\\ (\sigma\_{3}+s\*\sigma\_{2}),\sigma\_{4}\\ (\sigma\_{5}+r\*\sigma\_{2}),\sigma\_{6}\end{pmatrix}\odot\varGamma\_{1},[t/x]\varGamma\_{2}\vdash[t/x]t':[t/x]B.$$

As with substitution for well-formed contexts, we account for the replacement of x with t in Γ<sup>2</sup> by 'cutting out' x from the context grade vectors, and adding on the grades required to form t, scaled by the grades that described x's usage. We additionally must account for the altered subject and subject-type usage. We do this in a similar manner, by taking, for example, the usage of Γ<sup>1</sup> in the subject (σ3), and adding on the grades required to form t, scaled by the grade with which x was previously used (s). Subject-type grades are calculated similarly.

### **4.2 Type Preservation**

**Lemma 6.** Reduction implies equality If (<sup>Δ</sup> <sup>|</sup> <sup>σ</sup><sup>1</sup> <sup>|</sup> <sup>σ</sup>2) <sup>9</sup> <sup>Γ</sup> <sup>t</sup><sup>1</sup> : <sup>A</sup> and <sup>t</sup><sup>1</sup> t2, then (Δ | σ<sup>1</sup> | σ2) 9 Γ t<sup>1</sup> = t<sup>2</sup> : A.

**Lemma 7.** Equality inversion If (Δ | σ<sup>1</sup> | σ2) 9 Γ t<sup>1</sup> = t<sup>2</sup> : A, then (Δ | σ<sup>1</sup> | σ2) 9 Γ t<sup>1</sup> : A and (Δ | σ<sup>1</sup> | σ2) 9 Γ t<sup>2</sup> : A.

**Lemma 8.** Type preservation If (<sup>Δ</sup> <sup>|</sup> <sup>σ</sup><sup>1</sup> <sup>|</sup> <sup>σ</sup>2) <sup>9</sup> <sup>Γ</sup> <sup>t</sup> : <sup>A</sup> and <sup>t</sup> t , then (Δ | σ<sup>1</sup> | σ2) 9 Γ t : A.

Proof. By Lemma 6 we have (Δ | σ<sup>1</sup> | σ2) 9 Γ t = t : A, and therefore by Lemma 7 we have (Δ | σ<sup>1</sup> | σ2) 9 Γ t : A, as required.

### **4.3 Structural Rules**

We now consider the structural rules of contraction, exchange, and weakening.

**Lemma 9 (Contraction).** The following rule is admissible:

$$\frac{\begin{pmatrix} \frac{\Delta\_{1},\sigma\_{1},(\sigma\_{1},0),\Delta\_{2}}{\sigma\_{2},s\_{1},s\_{2},\sigma\_{3} \\ \sigma\_{4},r\_{1},r\_{2},\sigma\_{5} \end{pmatrix} \odot \varGamma\_{1}, x:A, y:A, I\_{2} \vdash t:B \quad |\Delta\_{1}| = |\sigma\_{2}| = |\sigma\_{4}| = |\varGamma\_{1}| \\ \hline \frac{\begin{pmatrix} \Delta\_{1},\sigma\_{1},\text{conv}((|\Delta\_{1}|;\Delta\_{2}) \\ \sigma\_{2},(s\_{1}+s\_{2}),\sigma\_{3} \\ \sigma\_{4},(r\_{1}+r\_{2}),\sigma\_{5} \end{pmatrix} \odot \varGamma\_{1}, z:A, [z,z/x,y] \\ \Gamma\_{2} \vdash [z,z/x,y]t:[z,z/x,y]B \end{pmatrix}}{\text{Covrn}} \\ \text{Covrn}$$

The operation contr(π; Δ) contracts the elements at index π and π + 1 for each vector in Δ by combining them with the semiring addition, defined contr(π; Δ) = <sup>Δ</sup>\(π+ 1)+Δ/(π+ 1)∗(**0**<sup>π</sup>, 1). Admissibility follows from the semiring addition, which serves to contract dependencies, being threaded throughout the rules.

**Lemma 10 (Exchange).** The following rule is admissible:

x ∈ FV (B) |Δ1| = |σ3| = |σ5| = |Γ1| 4 <sup>Δ</sup>1,σ1,(σ2,0),Δ<sup>2</sup> σ3,s1,s2,σ<sup>4</sup> σ5,r1,r2,σ<sup>6</sup> 5 9 Γ1, x : A, y : B,Γ<sup>2</sup> t : C 4 <sup>Δ</sup>1,σ2,(σ1,0),exch(|Δ1|;Δ2) <sup>σ</sup>3,s2,s1,σ<sup>4</sup> σ5,r2,r1,σ<sup>6</sup> 5 9 Γ1, y : B, x : A, Γ<sup>2</sup> t : C Exc

Notice that if you strip away the vector fragment and sizing premise, this is exactly the form of exchange we would expect in a dependent type theory: if x and y are assumptions in a context typing t : C, and the type of y does not depend upon x, then we can type t : C when we swap the order of x and y.

The action on grade vectors is simple: we swap the grades associated with each of the variables. For the context grade vector however, we must do two things: first, we capture the formation of A with σ1, and the formation of B with σ1, 0 (indicating x being used with grade 0 in B), then swap these around, cutting the final grade from σ2, 0, and adding 0 to the end of σ<sup>1</sup> to ensure correct sizing. Next, the operation exch(|Δ1| ; Δ2) swaps the element at index |Δ1| (i.e., that corresponding to usage of x) with the element at index |Δ1| + 1 (corresponding to y) for every vector in Δ2; this exchange operation ensures that usage in the trailing context is reordered appropriately.

**Lemma 11 (Weakening).** The following rule is admissible:

$$\frac{\begin{array}{c} (\Delta\_1, \Delta\_2 \mid \sigma\_1, \sigma\_1' \mid \sigma\_2, \sigma\_2') \odot \Gamma\_1, \Gamma\_2 \vdash t: B\\ (\Delta\_1 \mid \sigma\_3 \mid \emptyset) \odot \Gamma\_1 \vdash A: \mathsf{Type}\_l \end{array} \quad \begin{array}{c} \left| \sigma\_1 \right| = \left| \sigma\_2 \right| = \left| \Gamma\_1 \right|\\ \left| \sigma\_1 \right| = \left| \sigma\_2 \right| = \left| \Gamma\_1 \right|\\ \left| \sigma\_1, \sigma\_3, \mathsf{ins}(\left| \Delta\_1 \right| \left| \left| \, 0; \Delta\_2 \right| \mid \sigma\_1, \varnothing, \sigma\_1' \right) \odot \Gamma\_1, x: A, \Gamma\_2 \vdash t: B \end{array} \text{WEAK} \end{array}$$

Weakening introduces irrelevant assumptions to a context. We do this by capturing the usage in the formation of the assumption's type with σ<sup>3</sup> to preserve the well-formedness of the context. We then indicate irrelevance of the assumption by grading with 0 in appropriate places. The operation ins(π; s; Δ) inserts the element s at index π for each σ in Δ, such that all elements preceding index π (in σ) keep their positions, and every element at index π or greater (in σ) will be shifted one index later in the new vector. The 0 grades in the subject and subject-type grade vector positions correspond to the absence of the irrelevant assumption from the subject and subject's type.

### **4.4 Strong Normalization**

We adapt Geuvers' strong normalization proof for the Calculus of Constructions (CC) [24] to a fragment of Grtt (called Grtt{0,1}) restricted to two universe levels and without variables of type Type1. This results in a less expressive system than full Grtt when it comes to higher kinds, but this is orthogonal to the main idea here of grading. We briefly overview the strong normalization proof; details can be found in the extended version [44]. Note this strong normalization result is with respect to β-reduction only (our semantics does not include η-reduction).

We use the proof technique of saturated sets, based on the reducibility candidates of Girard [29]. While Grtt{0,1} has a collapsed syntax we use judgments to break typing up into stages. We use these sets to match on whether a term is a kind, type, constructor, or a function (we will refer to these as terms).

**Definition 3.** Typing can be broken up into the following stages:

Kind := {A | ∃Δ, σ1, Γ.(Δ | σ<sup>1</sup> | **0**) ) Γ A : Type1} Type := {A | ∃Δ, σ1, Γ.(Δ | σ<sup>1</sup> | **0**) ) Γ A : Type0} Con := {t | ∃Δ, σ1, σ2, Γ, A.(Δ | σ<sup>1</sup> | σ2) ) Γ t : A ∧ (Δ | σ<sup>2</sup> | **0**) ) Γ A : Type1} Term := {t | ∃Δ, σ1, σ2, Γ, A.(Δ | σ<sup>1</sup> | σ2) ) Γ t : A ∧ (Δ | σ<sup>2</sup> | **0**) ) Γ A : Type0}

**Lemma 12 (Classification).** We have Kind ∩ Type = ∅ and Con ∩ Term = ∅.

The classification lemma states that we can safely case split over kinds and types, or constructors and terms without fear of an overlap occurring.

Saturated sets are essentially collections of strongly normalizing terms that are closed under β-reduction. The intuition behind this proof is that every typable program ends up in some saturated set, and hence, is strongly normalizing.

**Definition 4.** [Base terms and saturated terms] Informally, the set of base terms B is inductively defined from variables and Type<sup>0</sup> and Type1, and compound terms over base B and strongly normalising terms SN.

A set of terms X is saturated if X ⊂ SN, B ⊂ X, and if red<sup>k</sup> t ∈ X and t ∈ SN, then t ∈ X. Thus saturated sets are closed under strongly normalizing terms with a key redex, denoted red<sup>k</sup> t, which are redexes or a redex at the head of an elimination form. SAT denotes the collection of saturated sets.

**Lemma 13 (**SN **saturated).** All saturated sets are non-empty; SN is saturated.

Since Grtt{0,1} allows computation in types as well as in types, we separate the interpretations for kinds and types, where the former is a set of the latter.

**Definition 5.** For <sup>A</sup> <sup>∈</sup> Kind, the kind interpretation, <sup>K</sup>-<sup>A</sup>, is defined:

<sup>K</sup>Type<sup>0</sup> <sup>=</sup> SAT <sup>K</sup>(<sup>x</sup> :(s,r) <sup>A</sup>) <sup>→</sup> <sup>B</sup> <sup>=</sup> {<sup>f</sup> <sup>|</sup> <sup>f</sup> : <sup>K</sup>A → KB}, if A, B <sup>∈</sup> Kind <sup>K</sup><sup>s</sup>A <sup>=</sup> <sup>K</sup>A <sup>K</sup>(<sup>x</sup> :(s,r) <sup>A</sup>) <sup>→</sup> <sup>B</sup> <sup>=</sup> <sup>K</sup>A, if <sup>A</sup> <sup>∈</sup> Kind, B <sup>∈</sup> Type <sup>K</sup>(<sup>x</sup> :(s,r) <sup>A</sup>) <sup>→</sup> <sup>B</sup> <sup>=</sup> <sup>K</sup>B, if <sup>A</sup> <sup>∈</sup> Type, B <sup>∈</sup> Kind <sup>K</sup>(<sup>x</sup> :<sup>r</sup> <sup>A</sup>) <sup>⊗</sup> <sup>B</sup> <sup>=</sup> <sup>K</sup>A × KB, if A, B <sup>∈</sup> Kind <sup>K</sup>(<sup>x</sup> :<sup>r</sup> <sup>A</sup>) <sup>⊗</sup> <sup>B</sup> <sup>=</sup> <sup>K</sup>A, if <sup>A</sup> <sup>∈</sup> Kind, B <sup>∈</sup> Type <sup>K</sup>(<sup>x</sup> :<sup>r</sup> <sup>A</sup>) <sup>⊗</sup> <sup>B</sup> <sup>=</sup> <sup>K</sup>B, if <sup>A</sup> <sup>∈</sup> Type, B <sup>∈</sup> Kind

Next we define the interpretation of types, which requires the interpretation to be parametric on an interpretation of type variables called a type evaluation. This is necessary to make the interpretation well-founded (first realized by Girard [29]).

**Definition 6.** Type valuations, Δ 9 Γ |= ε, are defined as follows:

$$\begin{array}{c} \begin{array}{c} X \in \mathcal{K}[A] \\ (\Delta \mid \sigma \mid \mathbf{0}) \odot I \vdash A \mathrel{\mathop{:}} I \vdash \mathbf{A} \mathrel{\mathop{:}} \mathsf{Type}\_{1} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \Delta \odot I \vdash \varepsilon \\ (\Delta \mid \sigma \mid \mathbf{0}) \odot I \vdash A \mathrel{\mathop{:}} I \end{array} \end{array} \begin{array}{c} \begin{array}{c} \Delta \odot I \vdash \varepsilon \\ (\Delta \mid \sigma \mid \mathbf{0}) \odot I \vdash A \mathrel{\mathop{:}} \mathsf{Type}\_{0} \end{array} \end{array} \end{array}$$

Type valuations ignore term variables (rule Tm), in fact, the interpretations of both types and kinds ignores them because we are defining sets of terms over types, and thus terms in types do not contribute to the definition of these sets. However as these interpretations define sets of open terms we must carry a graded context around where necessary. Thus, type valuations are with respect to a well-formed graded context Δ 9 Γ. We now outline the type interpretation.

**Definition 7.** For type valuation Δ9Γ |= ε and a type A ∈ (Kind∪Type∪Con) with <sup>A</sup> typable in <sup>Δ</sup> <sup>9</sup> <sup>Γ</sup>, the interpretation of types -<sup>A</sup><sup>ε</sup> is defined inductively. For brevity, we list just a few illustrative cases, including modalities and some function cases; the complete definition is given in the extended version [44].

Type<sup>1</sup><sup>ε</sup> <sup>=</sup> SN Type<sup>0</sup><sup>ε</sup> <sup>=</sup> λX <sup>∈</sup> SAT.SN x<sup>ε</sup> <sup>=</sup> ε x if <sup>x</sup> <sup>∈</sup> Con <sup>s</sup>A<sup>ε</sup> <sup>=</sup> A<sup>ε</sup> λx : A.B<sup>ε</sup> <sup>=</sup> λX ∈ KA.B<sup>ε</sup>[x→X] if <sup>A</sup> <sup>∈</sup> Kind, B <sup>∈</sup> Con A B<sup>ε</sup> <sup>=</sup> A<sup>ε</sup>(B<sup>ε</sup>) if <sup>B</sup> <sup>∈</sup> Con (<sup>x</sup> :(s,r) <sup>A</sup>) <sup>→</sup> <sup>B</sup><sup>ε</sup> <sup>=</sup> λX ∈ KA → KB. \* <sup>Y</sup> ∈KA(A<sup>ε</sup> <sup>Y</sup> <sup>→</sup> B<sup>ε</sup>[x→<sup>Y</sup> ] (<sup>X</sup> (<sup>Y</sup> ))) if A, B ∈ Kind

Grades play no role in the reduction relation for Grtt, and hence, our interpretation erases graded modalities and their introductory and elimination forms (translated into substitutions). In fact, the above interpretation can be seen as a translation of Grtt{0,1} into non-substructural set theory; there is no data-usage tracking in the image of the interpretation. Tensors are translated into Cartesian products whose eliminators are translated into substitutions similarly to graded modalities. All terms however remain well-typed through the interpretation.

The interpretation of terms corresponds to term valuations that are used to close the term before interpreting it into the interpretation of its type.

**Definition 8.** Valid term valuations, Δ 9 Γ |=<sup>ε</sup> ρ, are defined as follows:

∅9∅|=<sup>∅</sup> ∅ E <sup>t</sup> <sup>∈</sup> (A<sup>ε</sup>) (ε x) Δ ) Γ |=<sup>ε</sup> ρ (Δ | σ | **0**) ) Γ A : Type<sup>1</sup> (Δ, σ) 9 Γ, x : A |=<sup>ε</sup> ρ[x → t] Ty <sup>t</sup> <sup>∈</sup> A<sup>ε</sup> Δ ) Γ |=<sup>ε</sup> ρ (Δ | σ | **0**) ) Γ A : Type<sup>0</sup> (Δ, σ) 9 Γ, x : A |=<sup>ε</sup> ρ[x → t] Tm

We interpret terms as substitutions, but graded modalities must be erased and their elimination forms converted into substitutions (and similarly for the eliminator for tensor products).

**Definition 9.** Suppose Δ 9 Γ |=<sup>ε</sup> ρ. Then the interpretation of a term t typable in <sup>Δ</sup> <sup>9</sup> <sup>Γ</sup> is t<sup>ρ</sup> <sup>=</sup> ρ t, but where all let-expressions are translated into substitutions, and all graded modalities are erased.

Finally, we prove our main result using semantic typing which will imply strong normalization. Suppose (Δ | σ<sup>1</sup> | σ2) 9 Γ t : A, then:

**Definition 10.** Semantic typing, (Δ | σ<sup>1</sup> | σ2)9Γ |= t : A, is defined as follows:

1. If (<sup>Δ</sup> <sup>|</sup> <sup>σ</sup> <sup>|</sup> **<sup>0</sup>**) ) <sup>Γ</sup> <sup>A</sup> : Type1, then for every <sup>Δ</sup> ) <sup>Γ</sup> <sup>|</sup>=<sup>ε</sup> <sup>ρ</sup>, t<sup>ρ</sup> <sup>∈</sup> A<sup>ε</sup> (t<sup>ε</sup>). 2. If (<sup>Δ</sup> <sup>|</sup> <sup>σ</sup> <sup>|</sup> **<sup>0</sup>**) ) <sup>Γ</sup> <sup>A</sup> : Type0, then for every <sup>Δ</sup> ) <sup>Γ</sup> <sup>|</sup>=<sup>ε</sup> <sup>ρ</sup>, t<sup>ρ</sup> <sup>∈</sup> A<sup>ε</sup>.

**Theorem 3 (Soundness for Semantic Typing).** (Δ | σ<sup>1</sup> | σ2) 9 Γ |= t : A. **Corollary 1 (Strong Normalization).** We have t ∈ SN.

# **5 Implementation**

Our implementation **Gerty** is based on a bidirectionalised version of the typing rules here, somewhat following traditional schemes of bidirectional typing [19,20] but with grading (similar to Granule [46] but adapted considerably for the dependent setting). We briefly outline the implementation scheme and highlight a few key points, rules, and examples. We use this implementation to explore further applications of Grtt, namely optimising type checking algorithms.

Bidirectional typing splits declarative typing rules into check and infer modes. Furthermore, bidirectional Grtt rules split the grading context (left of <sup>9</sup>) into input and output contexts where (Δ | σ<sup>1</sup> | σ2) 9 Γ t : A is implemented via:

$$(\text{(check)}\ \Delta; \varGamma \vdash t \Leftarrow A; \sigma\_1; \sigma\_2 \quad or \ (\text{infer})\ \ \Delta; \varGamma \vdash t \Rightarrow A; \sigma\_1; \sigma\_2$$

where ⇐ rules check that t has type A and ⇒ rules infer (calculate) that t has type A. In both judgments, the context grading Δ and context Γ left of are inputs whereas the grade vectors σ<sup>1</sup> and σ<sup>2</sup> to the right of A are outputs. This input-output context approach resembles that employed in linear type checking [5,32,62]. Rather than following a "left over" scheme as in these works (where the output context explains what resources are left), the output grades here explain what has been used according to the analysis of grading ('adding up' rather than 'taking away').

For example, the following is the infer rule for function elimination:

$$\begin{array}{ll} \Delta; \Gamma \vdash t\_1 \Rightarrow (x:\_{(s,r)} A) \rightarrow B; \sigma\_2; \sigma\_{13} \\ \Delta; \Gamma \vdash t\_2 \Leftarrow A; \sigma\_4; \sigma\_1 \\ \Delta, \sigma\_1; \Gamma, x:A \vdash B \Rightarrow \mathsf{Type}\_l; \sigma\_3, r; \mathbf{0} \\ \hline \Delta; \Gamma \vdash t\_1 \, t\_2 \Rightarrow [t\_2/x]B; \sigma\_2 + s\*\sigma\_4; \sigma\_3 + r\*\sigma\_4 \end{array} \Rightarrow \lambda\_e.$$

The rule can be read by starting at the input of the conclusion (left of ), then reading top down through each premise, to calculate the output grades in the rule's conclusion. Any concrete value or already-bound variable appearing in the output grades of a premise can be read as causing an equality check in the type checker. The last premise checks that the output subject-type grade σ<sup>13</sup> from the first premise matches σ<sup>1</sup> + σ<sup>3</sup> (which were calculated by later premises).

In contrast, function introduction is a check rule:

$$\frac{\Delta; \varGamma \vdash A \Rightarrow \mathsf{Type}\_{l}; \sigma\_{1}; \mathsf{O} \qquad \Delta, \sigma\_{1}; \varGamma, x: A \vdash t \Leftarrow B; \sigma\_{2}, s; \sigma\_{3}, r}{\Delta; \varGamma \vdash \lambda x. t \Leftarrow (x:\_{(s,r)} A) \rightarrow B; \sigma\_{2}; \sigma\_{1} + \sigma\_{3}} \Leftarrow \lambda\_{i}$$

Thus, dependent functions can be checked against type (x :(s,r) A) → B given input Δ; Γ by first inferring the type of A and checking that its output subjecttype grade comprises all zeros **0**. Then the body of the function t is checked against B under the context Δ, σ1; Γ, x : A producing grade vectors σ2, s and σ1, r where it is checked that s = s and r = r (described implicitly in the rule), i.e., the calculated grades match those of the binder.

The implementation anticipates some further work for Grtt: the potential for grades which are first-class terms, for which we anticipate complex equations on grades. For grade equality, **Gerty** has two modes: one which normalises terms and then compares for syntactic equality, and the other which discharges constraints via an off-the-shelf SMT solver (we use Z3 [17]). We discuss briefly some performance implications in the next section.

Using Grades to Optimise Type Checking Abel posited that a dependent theory with quantitative resource tracking at the type level could leverage linearitylike optimisations in type checking [2]. Our implementation provides a research vehicle for exploring this idea; we consider one possible optimisation here.

Key to dependent type checking is the substitution of terms into types in elimination forms (i.e., application, tensor elimination). However, in a quantitative semiring setting, if a variable has 0 subject-type grade, then we know it is irrelevant to type formation (it is not semantically depended upon, i.e., during normalisation). Subsequently, substitutions into a 0-graded variable can be elided (or allocations to a closure environment can be avoided). We implemented this optimisation in **Gerty** when inferring the type of an application for t<sup>1</sup> t<sup>2</sup> (rule ⇒ λ<sup>e</sup> above), where the type of t<sup>1</sup> is inferred as (x :(s,0) A) → B. For a quantitative semiring we know that x irrelevant in B, thus we need not perform the substitution [t2/x]B when type checking the application.

We evaluate this on simple **Gerty** programs of an n-ary "fanout" combinator implemented via an n-ary application combinator, e.g., for arity 3:

```
app3 : (a : (0, 6) Type 0) -> (b : (0, 2) Type 0)
app3 = \a -> \b -> \x0 -> \x1 -> \x2 -> \f -> f x0 x1 x2
fan3 : (a : (0, 4) Type 0) -> (b : (0, 2) Type 0)
fan3 = \a -> \b -> \f -> \x -> app3 a b x x x f
```
Note that fan3 uses its parameter x three times (hence the grade 3) which then incurs substitutions into the type of app3 during type checking, but each such substitution is redundant since the type does not depend on these parameters, as reflected by the 0 subject-type grades.

To evaluate the optimisation and SMT solving vs. normalisation-based equality, we ran **Gerty** on the fan out program for arities from 3 to 8, with and without the optimisation and under the two equality approaches.


**Table 1.** Performance analysis of grade-based optimisations to type checking. Times in milliseconds to 2 d.p. with the standard error given in brackets. Measurements are the mean of 10 trials (run on a 2.7 Ghz Intel Core, 8Gb of RAM, Z3 4.8.8).

Table 1 gives the results. For grade equality by normalisation, the optimisation has a positive effect on speedup, getting increasingly significant (up to 38%) as the overall cost increases. For SMT-based grade equality, the optimisation causes some slow down for arity 4 and 5 (and just breaking even for arity 3). This is because working out whether the optimisation can be applied requires checking whether grades are equal to 0, which incurs extra SMT solver calls. Eventually, this cost is outweighed by the time saved by reducing substitutions. Since the grades here are all relatively simple, it is usually more efficient for the type checker to normalise and compare terms rather than compiling to SMT and starting up the external solver, as seen by longer times for the SMT approach.

The baseline performance here is poor (the implementation is not highly optimised) partly due to the overhead of computing type formation judgments often to accurately account for grading. However, such checks are often recomputed and could be optimised away by memoisation. Nevertheless this experiment gives the evidence that grades can indeed be used to optimise type checking. A thorough investigation of grade-directed optimisations is future work.

# **6 Discussion**

Grading, Coeffects, and Quantitative Types The notion of coeffects, describing how a program depends on its context, arose in the literature from two directions: as a dualisation of effect types [48,49] and a generalisation of Bounded Linear Logic to general resource semirings [25,10]. Coeffect systems can capture reuse bounds, information flow security [23], hardware scheduling constraints [25], and sensitivity for differential privacy [16,22]. A coeffect-style approach also enables linear types to be retrofitted to Haskell [8]. A common thread is the annotation of variables in the context with usage information, drawn from a semiring. Our approach generalises this idea to capture type, context, and computational usage.

McBride [43] reconciles linear and dependent types, allowing types to depend on linear values, refined by Atkey [6] as Quantitative Type Theory. Qtt employs coeffect-style annotation of each assumption in a context with an element of a resource accounting algebra, with judgments of the form:

$$x\_1 \stackrel{\rho\_1}{:} A\_1, \dots, x\_n \stackrel{\rho\_n}{:} A\_n \vdash M \stackrel{\rho}{:} B$$

where ρi, ρ are elements of a semiring, and ρ = 0 or ρ = 1, respectively denoting a term which can be used in type formation (erased at runtime) or at runtime. Dependent function arrows are of the form (x ρ : A) → B, where ρ is a semiring element that denotes the computational usage of the parameter.

Variables used for type formation but not computation are annotated by 0. Subsequently, type formation rules are all of the form 0Γ T, meaning every variable assumption has a 0 annotation. Grtt is similar to Qtt, but differs in its more extensive grading to track usage in types, rather than blanketing all type usage with 0. In Atkey's formulation, a term can be promoted to a type if its result and dependency quantities are all 0. A set of rules provide formation of computational type terms, but these are also graded at 0. Subsequently, it is not possible to construct an inhabitant of Type that can be used at runtime. We avoid this shortcoming allowing matching on types. For example, a computation t that inspects a type variable a would be typed as: (Δ, **0**, Δ | σ1, 1, σ <sup>1</sup> | σ2, r, σ <sup>2</sup>)9Γ, a : Type, Γ t : B denoting 1 computational use and r type uses in B.

At first glance, it seems Qtt could be encoded into Grtt taking the semiring <sup>R</sup> of Qtt and parameterising Grtt by the semiring R∪{ˆ0} where ˆ0 denotes arbitrary usage in type formation. However, there is impedance between the two systems as Qtt always annotates type use with 0. It is not clear how to make this happen in Grtt whilst still having non-0 tracking at the computational level, since we use one semiring for both. Exploring an encoding is future work.

Choudhury et al. [13] give a system closely related (but arguably simpler) to Qtt called GraD. One key difference is that rather than annotating type usage with 0, grades are simply ignored in types. This makes for a surprisingly flexible system. In addition, they show that irrelevance is captured by the 0 grade using a heap-based semantics (a result leveraged in Section 3). GraD however does not have the power of type-grades presented here.

Dependent Types and Modalities Dal Lago and Gaboardi extend PCF with linear and lightweight dependent types [15] (then adapted for differential privacy analysis [22]). They add a natural number type indexed by upper and lower bound terms which index a modality. Combined with linear arrows of the form [a<I].σ τ these describe functions using the parameter at most I times (where the modality acts as a binder for index variable a which denotes instantiations). Their system is leveraged to give fine-grained cost analyses in the context of Implicit Computational Complexity. Whilst a powerful system, their approach is restricted in terms of dependency, where only a specialised type can depend on specialised natural-number indexed terms (which are non-linear).

Gratzer et al. define a dependently-typed language with a Fitch-style modality [30]. It seems that such an approach could also be generalised to a graded modality, although we have used the natural-deduction style for our graded modality rather than the Fitch-style.

As discussed in Section 1, our approach closely resembles Abel's resourceful dependent types [2]. Our work expands on the idea, including tensors and the graded modalities. We considerably developed the associated metatheory, provide an implementation, and study applications.

Further Work One expressive extension is to capture analyses which have an ordering, e.g., grading by a pre-ordered semiring, allowing a notion of approximation. This would enable analyses such as bounded reuse from Bounded Linear Logic [28], intervals with least- and upper-bounds on use [46], and top-completed semirings, with an ∞-element denoting arbitrary usage as a fall-back. We have made progress into exploring the interaction between approximation and dependent types, and the remainder of this is left as future work.

A powerful extension of Grtt for future work is to allow grades to be firstclass terms. Typing rules in Grtt involving grades could be adapted to internalise the elements as first-class terms. We could then, e.g., define the map function over sized vectors, which requires that the parameter function is used exactly the same number of times as the length of the vector:

$$\begin{array}{c} \mathsf{map} : (n :\_{(0,5)} \mathsf{nat}) \to (a :\_{(0,n+1)} \mathsf{Type}) \to (b :\_{(0,n+1)} \mathsf{Type}) \to \\ (f :\_{(n,0)} (x :\_{(1,0)} a) \to b) \to (xs :\_{(1,0)} \mathsf{Vec} \, n \, a) \to \mathsf{Vec} \, n \, b \end{array}$$

This type provides strong guarantees: the only well-typed implementations do the correct thing, up to permutations of the result vector. Without the grading, an implementation could apply f fewer than n times, replicating some of the transformed elements; here we know that f must be applied exactly n-times.

A further appealing possibility for Grtt is to allow the semiring to be defined internally, rather than as a meta-level parameter, leveraging dependent types for proofs of key properties. An implementation could specify what is required for a semiring instance, e.g., a record type capturing the operations and properties of a semiring. The rules of Grtt could then be extended, similarly to the extension to first-class grades, with the provision of the semiring(s) coming from Grtt terms. Thus, anywhere with a grading premise (Δ | σ<sup>1</sup> | σ2) 9 Γ r : R would also require a premise (Δ | σ<sup>2</sup> | **0**) 9 Γ R : Semiring. This opens up the ability for programmers and library developers to provide custom modes of resource tracking with their libraries, allowing domain-specific program verification.

Conclusions The paradigm of 'grading' exposes the inherent structure of a type theory, proof theory, or semantics by matching the underlying structure with some algebraic structure augmenting the types. This idea has been employed for reasoning about side effects via graded monads [35], and reasoning about data flow as discussed here by semiring grading. Richer algebras could be employed to capture other aspects, such as ordered logics in which the exchange rule can be controlled via grading (existing work has done this via modalities [34]).

We developed the core of grading in the context of dependent-types, treating types and terms equally (as one comes to expect in dependent-type theories). The tracking of data flow in types appears complex since we must account for how variables are used to form types in both the context and in the subject type, making sure not to repeat context formation use. The result however is a powerful system for studying dependencies in type theories, as shown by our ability to study different theories just be specialising grades. Whilst not yet a fully fledged implementation, **Gerty** is a useful test bed for further exploration.

Acknowledgments Orchard is supported by EPSRC grant EP/T013516/1.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Automated Termination Analysis of Polynomial Probabilistic Programs

Marcel Moosbrugger1(-) , Ezio Bartocci<sup>1</sup> , Joost-Pieter Katoen<sup>2</sup> , and Laura Kovács<sup>1</sup>

<sup>1</sup> TU Wien, Vienna, Austria marcel.moosbrugger@tuwien.ac.at <sup>2</sup> RWTH Aachen University, Aachen, Germany

Abstract. The termination behavior of probabilistic programs depends on the outcomes of random assignments. Almost sure termination (AST) is concerned with the question whether a program terminates with probability one on all possible inputs. Positive almost sure termination (PAST) focuses on termination in a finite expected number of steps. This paper presents a fully automated approach to the termination analysis of probabilistic while-programs whose guards and expressions are polynomial expressions. As proving (positive) AST is undecidable in general, existing proof rules typically provide sufficient conditions. These conditions mostly involve constraints on supermartingales. We consider four proof rules from the literature and extend these with generalizations of existing proof rules for (P)AST. We automate the resulting set of proof rules by effectively computing asymptotic bounds on polynomials over the program variables. These bounds are used to decide the sufficient conditions – including the constraints on supermartingales – of a proof rule. Our software tool AMBER can thus check AST, PAST, as well as their negations for a large class of polynomial probabilistic programs, while carrying out the termination reasoning fully with polynomial witnesses. Experimental results show the merits of our generalized proof rules and demonstrate that AMBER can handle probabilistic programs that are out of reach for other state-of-the-art tools.

Keywords: Probabilistic Programming · Almost sure Termination · Martingales · Asymptotic Bounds · Linear Recurrences

# 1 Introduction

*Classical program termination.* Termination is a key property in program analysis [16]. The question whether a program terminates on all possible inputs – the universal halting problem – is undecidable. Proof rules based on ranking functions have been developed that impose sufficient conditions implying (non-)termination. Automated termination checking has given rise to powerful software tools such as AProVE [21] and NaTT [44] (using term rewriting), and UltimateAutomizer [26] (using automata theory). These tools have shown to be able to determine the termination of several intricate programs. The industrial

This research was supported by the WWTF ICT19-018 grant ProbInG, the ERC Starting Grant SYMCAR 639270, the ERC AdG Grant FRAPPANT 787914, and the Austrian FWF project W1255-N23.

c The Author(s) 2021

N. Yoshida (Ed.): ESOP 2021, LNCS 12648, pp. 491–518, 2021.

https://doi.org/10.1007/978-3-030-72019-3\_18

492 M. Moosbrugger et al.

$$\begin{array}{lcll} \text{x := 10} & \text{x := 10} & \text{x := 0, y = 0} & \text{x := 10, y := 0} \\ \text{while } x > 0 \text{ do} & \text{while } x > 0 \text{ do} & \text{while } x^2 + y^2 < 100 \text{ do} & \text{while } x > 0 \text{ do} \\ \text{ } & \mathbf{x := \mathbf{x} + 1} \, [1/2] \, \mathbf{x} - 1 & \text{x := \mathbf{x} - 1} \, [1/2] \, \mathbf{x} + 2 & \begin{array}{l} \mathbf{x := \mathbf{x} + 1} \, [1/2] \, \mathbf{x} - 1 \\ \mathbf{y := \mathbf{y} + x \, [1/2] \, \mathbf{y} - x} \end{array} & \begin{array}{l} \mathbf{x := \mathbf{x} + 1} \, [1/2] \, \mathbf{x} - 1 \\ \mathbf{y := \mathbf{y} + x \, [1/2] \, \mathbf{y} - x} \\ \mathbf{z := \mathbf{x} + 4y \, [1/2] \, \mathbf{x} - y^2 \\ \mathbf{end} \end{array} & \begin{array}{l} \mathbf{x := \mathbf{y} + 1} \, [1/2] \, \mathbf{x} - 1 \\ \mathbf{x := \mathbf{x} + 4y \, [1/2] \, \mathbf{x} - y^2 \\ \mathbf{z := \mathbf{x} + 4y \, [1/2] \, \mathbf{x} - y^2 \\ \mathbf{z := \mathbf{x} + 4y \, [1/2] \, \mathbf{x}} \end{array} \end{array}$$

Fig. 1: Examples of probabilistic programs in our probabilistic language. Program 1a is a symmetric 1D random walk. The program is almost surely terminating (AST) but not positively almost surely terminating (PAST). Program 1b is not AST. Programs 1c and 1d contain dependent variable updates with polynomial guards and both programs are PAST.

tool Terminator [15] has taken termination proving into practice and is able to prove termination – or even more general liveness properties – of e.g., device driver software. Rather than seeking a single ranking function, it takes a disjunctive termination argument using sets of ranking functions. Other results include termination proving methods for specific program classes such as linear and polynomial programs, see, e.g., [9,24].

*Termination of probabilistic program.* Probabilistic programs extend sequential programs with the ability to draw samples from probability distributions. They are used e.g. for, encoding randomized algorithms, planning in AI, security mechanisms, and in cognitive science. In this paper, we consider probabilistic while-programs with discrete probabilistic choices, in the vein of the seminal works [34] and [37]. Termination of probabilistic programs differs from the classical halting problem in several respects, e.g., probabilistic programs may exhibit diverging runs that have probability mass zero in total. Such programs do not always terminate, but terminate with probability one – they *almost surely* terminate. An example of such a program is given in Figure 1a where variable x is incremented by 1 with probability <sup>1</sup>/2, and otherwise decremented with this amount. This program encodes a one-dimensional (1D) left-bounded random walk starting at position 10. Another important difference to classical termination is that the expected number of program steps until termination may be infinite, even if the program almost surely terminates. Thus, almost sure termination (AST) does not imply that the expected number of steps until termination is finite. Programs that have a finite expected runtime are referred to as *positively almost surely* terminating (PAST). Figure 1c is a sample program that is PAST. While PAST implies AST, the converse does not hold, as evidenced by Figure 1a: the program of Figure 1a terminates with probability one but needs infinitely many steps on average to reach x=0, hence is not PAST. (The terminology AST and PAST was coined in [8] and has its roots in the theory of Markov processes.)

*Proof rules for AST and PAST.* Proving termination of probabilistic programs is hard: AST for a single input is as hard as the universal halting problem, whereas PAST is even harder [30]. Termination analysis of probabilistic programs is currently attracting quite some attention. It is not just of theoretical interest. For instance, a popular way to analyze probabilistic programs in machine learning is by using some advanced form of simulation. If, however, a program is not PAST, the simulation may take forever. In addition, the use of probabilistic programs in safety-critical environments [2,7,20] necessitates providing formal guarantees on termination. Different techniques are considered for probabilistic program termination ranging from probabilistic term rewriting [3], sized types [17], and

Büchi automata theory [14], to weakest pre-condition calculi for checking PAST [31]. A large body of works considers *proof rules* that provide sufficient conditions for proving AST, PAST, or their negations. These rules are based on martingale theory, in particular supermartingales. They are stochastic processes that can be (phrased in a simplified manner) viewed as the probabilistic analog of ranking functions: the value of a random variable represents the "value" of the function at the beginning of a loop iteration. Successive random variables model the evolution of the program loop. Being a supermartingale means that the expected value of the random variables at the end of a loop does not exceed its value at the start of the loop. Constraints on supermartingales form the essential part of proof rules. For example, the AST proof rule in [38] requires the existence of a supermartingale whose value decreases at least with a certain amount by at least a certain probability on each loop iteration. Intuitively speaking, the closer the supermartingales comes to zero – indicating termination – the more probable it is that it increases more. The AST proof rule in [38] is applicable to prove AST for the program in Figure 1a; yet, it cannot be used to prove PAST of Figures 1c-1d. On the other hand, the PAST proof rule in [10,19] requires that the expected decrease of the supermartingale on each loop iteration is at least some positive constant and on loop termination needs to be at most zero – very similar to the usual constraint on ranking functions. While [10,19] can be used to prove the program in Figure 1c to be PAST, these works cannot be used for Figure 1a. They cannot be used for proving Figure 1d to be PAST either. The rule for showing non-AST [13] requires the supermartingale to be repulsing. This intuitively means that the supermartingale decreases on average with at least ε and is positive on termination. Figuratively speaking, it repulses terminating states. It can be used to prove the program in Figure 1b to be not AST. In summary, while existing works for proving AST, PAST, and their negations are generic in nature, they are also restricted for classes of probabilistic programs. *In this paper, we propose relaxed versions of existing proof rules for probabilistic termination that turn out to treat quite a number of programs that could not be proven otherwise (Section 4).* In particular, (non-)termination of all four programs of Figure 1 can be proven using our proof rules.

*Automated termination checking of AST and PAST.* Whereas there is a large body of techniques and proof rules, software tool support to automate checking termination of probabilistic programs is still in its infancy. *This paper presents novel algorithms to automate various proof rules for probabilistic programs:* the three aforementioned proof rules [10,19,38,13] and a variant of the non-AST proof rule to prove non-PAST [13] 3. We also present relaxed versions of each of the proof rules, going beyond the stateof-the-art in the termination analysis of probabilistic programs. We focus on so-called Prob-solvable loops, extending [4]. Namely, we define Prob-solvable loops as probabilistic while-programs whose guards compare two polynomials (over program variables) and whose body is a sequence of random assignments with polynomials as right-hand side such that a variable x, say, only depends on variables preceding x in the loop body. While restrictive, Prob-solvable loops cover a vast set of interesting probabilistic programs (see Remark 1). An essential property of our programs is that the statistical moments of program variables can be obtained as closed-form formulas [4]. *The key of our algorithmic*

<sup>3</sup> For automation, the proof rule of [38] is considered for constant decrease and probability functions.

*approach is a procedure for computing asymptotic lower, upper and absolute bounds on polynomial expressions over program variables in our programs (Section 5).* This enables a novel method for automating probabilistic termination and non-termination proof rules based on (super)martingales, going beyond the state-of-the-art in probabilistic termination. Our relaxed proof rules allow us to fully automate (P)AST analysis by using only polynomial witnesses. Our experiments provide practical evidence that polynomial witnesses within Prob-solvable loops are sufficient to certify most examples from the literature and even beyond (Section 6).

*Our termination tool* AMBER*.* We have implemented our algorithmic approach in the publicly available tool AMBER. It exploits asymptotic bounds over polynomial martingales and uses the tool MORA [4] for computing the first-order moments of program variables and the computer algebra system package diofant. It employs over- and underapproximations realized by a simple static analysis. AMBER *establishes probabilistic termination in a fully automated manner* and has the following unique characteristics:


An experimental evaluation on various benchmarks shows that: (1) AMBER is superior to existing tools for automating PAST [42] and AST [10], (2) the relaxed proof rules enable proving substantially more programs, and (3) AMBER is able to automate the termination checking of intricate probabilistic programs (within the class of programs considered) that could not be automatically handled so far (Section 6). For example, AMBER *solves 23 termination benchmarks that no other automated approach could so far handle.*

*Main contributions.* To summarize, the main contributions of this paper are:


# 2 Preliminaries

We denote by N and R the set of natural and real numbers, respectively. Further, let R denote <sup>R</sup>∪{+∞,−∞}, <sup>R</sup><sup>+</sup> <sup>0</sup> the non-negative reals and R[x1,...,xm] the polynomial ring in <sup>x</sup>1,...,x<sup>m</sup> over <sup>R</sup>. We write <sup>x</sup>:=E(1) [p1] <sup>E</sup>(2) [p2]...[p<sup>m</sup>−<sup>1</sup>] <sup>E</sup>(m) for the probabilistic update of program variable x, denoting the execution of x := E(j) with probability p<sup>j</sup> , for <sup>j</sup> = 1,...,m−1, and the execution of <sup>x</sup>:=E(m) with probability <sup>1</sup>−<sup>m</sup>−<sup>1</sup> <sup>j</sup>=1 p<sup>j</sup> , where <sup>m</sup> <sup>∈</sup> <sup>N</sup>. We write indices of expressions over program variables in round brackets and use E<sup>i</sup> for the stochastic process induced by expression E. This section introduces our

programming language extending *Prob-solvable loops* [4] and defines the probability space introduced by such programs. Let E denote the expectation operator with respect to a probability space. We assume the reader to be familiar with probability theory [33].

### 2.1 Programming Model: Prob-Solvable Loops

Prob-solvable loops [4] are syntactically restricted probabilistic programs with polynomial expressions over program variables. The statistical higher-order moments of program variables, like expectation and variance of such loops, can always be computed as functions of the loop counter. In this paper, we extend Prob-solvable loops with polynomial loop guards in order to study their termination behavior, as follows.

Definition 1 (Prob-solvable loop L). *A* Prob-solvable loop L *with real-valued variables* <sup>x</sup>(1),...,x(m)*, where* <sup>m</sup>∈N*, is a program of the form:* <sup>I</sup><sup>L</sup> while <sup>G</sup><sup>L</sup> do <sup>U</sup><sup>L</sup> end*, with*


$$x\_{(j)} \coloneqq a\_{(j1)}x\_{(j)} + P\_{(j1)}\left[p\_{j1}\right]a\_{(j2)}x\_{(j)} + P\_{(j2)}\left[p\_{j2}\right]\dots\left[p\_{j(l\_j-1)}\right]a\_{(jl\_j)}x\_{(j)} + P\_{(jl\_j)}\left[p\_{jl\_j}\right]\dots\left[p\_{jl\_j}\right]$$

*where* <sup>a</sup>(jk)∈R<sup>+</sup> <sup>0</sup> *are constants,* <sup>P</sup>(jk)∈R[x(1),...,x(j−1)] *are polynomials,* <sup>p</sup>(jk)<sup>∈</sup> [0,1] *and* <sup>k</sup> pjk <1*.*

If L is clear from the context, the subscript L is omitted from IL, GL, and UL. Figure 1 gives four example Prob-solvable loops.

*Remark 1 (Prob-solvable expressiveness).* The enforced order of assignments in the loop body of Prob-solvable loops seems restrictive. However, many non-trivial probabilistic programs can be naturally modeled as succinct Prob-solvable loops. These include complex stochastic processes such as 2D random walks and dynamic Bayesian networks [5]. Almost all existing benchmarks on automated probabilistic termination analysis fall within the scope of Prob-solvable loops (cf. Section 6).

In the sequel, we consider an arbitrary Prob-solvable loop L and provide all definitions relative to L. The semantics of L is defined next, by associating L with a probability space.

### 2.2 Canonical Probability Space

A probabilistic program, and thus a Prob-solvable loop, can be semantically described as a probabilistic transition system [10] or as a probabilistic control flow graph [13], which in turn induce an infinite Markov chain (MC) 4. An MC is associated with a *sequence space* [33], a special probability space. In the sequel, we associate L with the sequence space of its corresponding MC, similarly as in [25].

<sup>4</sup> In fact, [13] consider Markov decision processes, but in absence of non-determinism in Prob-solvable loops, Markov chains suffice for our purpose.

Definition 2 (State, Run of L). *The* state *of Prob-solvable loop* L *over* m *variables, is a vector* <sup>s</sup>∈R<sup>m</sup>*. Let* <sup>s</sup>[j] *or* <sup>s</sup>[x(j)] *denote the* <sup>j</sup>*-th component of* <sup>s</sup> *representing the value of the variable* x(j) *in state* s*. A* run ϑ *of* L *is an infinite sequence of states.*

Note that any infinite sequence of states is a run. Infeasible runs will however be assigned measure 0. We write sB to denote that the logical formula B holds in state s.

Definition 3 (Loop Space of L). *The Prob-solvable loop* L *induces a canonical filtered probability space* (ΩL,ΣL,(F<sup>L</sup> <sup>i</sup> )<sup>i</sup>∈N,PL)*, called* loop space*, where*


$$p(s) := \mu\_{\mathcal{X}}(s), \quad p(\pi ss') := \begin{cases} p(\pi s) \cdot [s' = s], \,\text{if } s \models \neg \mathcal{G}\_{\mathcal{L}} \\ p(\pi s) \cdot \mu\_{\mathcal{U}}(s, s'), \,\text{if } s \models \mathcal{G}\_{\mathcal{L}} \end{cases}$$

μI(s) *denotes the probability that, after initialization* IL*, the loop* L *is in state* s*.* μ<sup>U</sup> (s,s ) *denotes the probability that, after one loop iteration starting in state* s*, the resulting program state is* s *.* [...]*represent the Iverson brackets, i.e.* [s =s] *is* 1 *iff* s =s*.*

Intuitively, <sup>P</sup>(Cyl(π)) is the probability that prefix <sup>π</sup> is the sequence of the first <sup>|</sup>π<sup>|</sup> program states when executing L. Moreover, the σ-algebra F<sup>i</sup> intuitively captures the information about the program run after the loop body U has been executed i times. We note that the effect of the loop body U is considered as atomic.

In order to formalize termination properties of a Prob-solvable loop L, we define the *looping time* of L to be a random variable in L's loop space.

Definition 4 (Looping Time of L). *The* looping time *of* L *is the random variable* T ¬G : <sup>Ω</sup>→N∪{∞}*, where* <sup>T</sup> ¬G(ϑ):=inf{i∈N|ϑ<sup>i</sup> ¬G}*.*

Intuitively, the looping time T ¬G maps a program run of L to the index of the first state falsifying the loop guard G of L or to ∞ if no such state exists. We now formalize termination properties of L using the looping time T ¬G.

Definition 5 (Termination of <sup>L</sup>). *The Prob-solvable loop* <sup>L</sup> *is* AST *if* <sup>P</sup>(<sup>T</sup> ¬G <sup>&</sup>lt;∞)=1*.* <sup>L</sup> *is PAST if* <sup>E</sup>(<sup>T</sup> ¬G)<∞*.*

### 2.3 Martingales

While for arbitrary probabilistic programs, answering <sup>P</sup>(<sup>T</sup> ¬G <sup>&</sup>lt; <sup>∞</sup>) and <sup>E</sup>(<sup>T</sup> ¬G <sup>&</sup>lt; <sup>∞</sup>) is undecidable, sufficient conditions for AST, PAST and their negations have been developed [10,19,38,13]. These works use (super)martingales which are special stochastic processes. In this section, we adopt the general setting of martingale theory to a Probsolvable loop L and then formalize sufficient termination conditions for L in Section 3.

Definition 6 (Stochastic Process of L). *Every arithmetic expression* E *over the program variables of* <sup>L</sup> *induces the stochastic process* (Ei)<sup>i</sup>∈<sup>N</sup>*,* <sup>E</sup><sup>i</sup> :<sup>Ω</sup> <sup>→</sup><sup>R</sup> *with* <sup>E</sup>i(ϑ):=E(ϑi)*. For a run* ϑ *of* L*,* Ei(ϑ) *is the evaluation of* E *in the* i*-th state of* ϑ*.*

In the sequel, for a boolean condition B over program variables x of L, we write B<sup>i</sup> to refer to the result of substituting x by x<sup>i</sup> in B.

Definition 7 (Martingales). *Let* (Ω,Σ,(Fi)<sup>i</sup>∈N,P) *be a filtered probability space and* (Mi)i∈<sup>N</sup> *be an integrable stochastic process adapted to* (Fi)i∈<sup>N</sup>*. Then* (Mi)i∈<sup>N</sup> *is a* martingale *if* <sup>E</sup>(M<sup>i</sup>+1 | Fi) = <sup>M</sup><sup>i</sup> *(or equivalently* <sup>E</sup>(M<sup>i</sup>+1−M<sup>i</sup> | Fi)=0*). Moreover,* (Mi)<sup>i</sup>∈<sup>N</sup> *is called a* supermartingale (SM) *if* <sup>E</sup>(M<sup>i</sup>+1 | Fi) <sup>≤</sup> <sup>M</sup><sup>i</sup> *(or equivalently* <sup>E</sup>(M<sup>i</sup>+1−M<sup>i</sup> |Fi)≤0*). For an arithmetic expression* <sup>E</sup> *over the program variables of* <sup>L</sup>*, the conditional expected value* <sup>E</sup>(E<sup>i</sup>+1−E<sup>i</sup> |Fi) *is called* the martingale expression of <sup>E</sup>*.*

# 3 Proof Rules for Probabilistic Termination

While AST and PAST are undecidable in general [30], sufficient conditions, called *proof rules*, for AST and PAST have been introduced, see e.g. [10,19,38,13]. In this section, we survey four proof rules, adapted to Prob-solvable loops. In the sequel, a *pure invariant* is a loop invariant in the classical deterministic sense [27]. Based on the probability space corresponding to L, a pure invariant holds before and after every iteration of L.

### 3.1 Positive Almost Sure Termination (PAST)

The proof rule for PAST introduced in [10] relies on the notion of ranking supermartingales (RSMs), which is a SM that decreases by a fixed positive on average at every loop iteration. Intuitively, RSMs resemble ranking functions for deterministic programs, yet for probabilistic programs.

Theorem 1 (Ranking-Supermartingale-Rule (RSM-Rule) [10], [19]). *Let* <sup>M</sup> :R<sup>m</sup> <sup>→</sup> <sup>R</sup> *be an expression over the program variables of* <sup>L</sup> *and* <sup>I</sup> *a pure invariant of* <sup>L</sup>*. Assume the following conditions hold for all* <sup>i</sup>∈N*:*

*1. (Termination)* G∧I =⇒ M >0

*2. (RSM Condition)* <sup>G</sup>i∧I<sup>i</sup> <sup>=</sup><sup>⇒</sup> <sup>E</sup>(M<sup>i</sup>+1−M<sup>i</sup> |Fi)≤−*, for some* >0*.*

*Then,* L *is PAST. Further,* M *is called an* -ranking supermartingale*.*

*Example 1.* Consider Figure 1c, set <sup>M</sup> := 100−x<sup>2</sup>−y<sup>2</sup> and := 2 and let <sup>I</sup> be true. Condition (1) of Theorem 1 trivially holds. Further, M is also an -ranking supermartingale, as <sup>E</sup>(M<sup>i</sup>+1−M<sup>i</sup> | Fi) = 100−E(x<sup>2</sup> <sup>i</sup>+1 | Fi)−E(y<sup>2</sup> <sup>i</sup>+1 | Fi)−100+x<sup>2</sup> <sup>i</sup> +y<sup>2</sup> <sup>i</sup> <sup>=</sup>−2−x<sup>2</sup> <sup>i</sup> ≤ −2. That is because E(x<sup>2</sup> <sup>i</sup>+1 | Fi) = <sup>x</sup><sup>2</sup> <sup>i</sup> + 1 and E(y<sup>2</sup> <sup>i</sup>+1 | Fi) = <sup>y</sup><sup>2</sup> <sup>i</sup> +x<sup>2</sup> <sup>i</sup> + 1. Figure 1c is thus proved PAST using the RSM-Rule.

# 3.2 Almost Sure Termination (AST)

Recall that Figure 1a is AST but not PAST, and hence the RSM-rule cannot be used for Figure 1a. By relaxing the ranking conditions, the proof rule in [38] uses general supermartingales to prove AST of programs that are not necessarily PAST.

Theorem 2 (Supermartingale-Rule (SM-Rule) [38]). *Let*<sup>M</sup> :R<sup>m</sup> <sup>→</sup>R≥<sup>0</sup> *be an expression over the program variables of* <sup>L</sup> *and* <sup>I</sup> *a pure invariant of* <sup>L</sup>*. Let* <sup>p</sup>:R≥0→(0,1] *(for* probability*) and* <sup>d</sup>:R≥0→R<sup>&</sup>gt;<sup>0</sup> *(for* decrease*) be antitone (i.e. monotonically decreasing) functions. Assume the following conditions hold for all* <sup>i</sup>∈N*:*

*1. (Termination)* G∧I =⇒ M >0


```
Then, L is AST.
```
Intuitively, the requirement of d and p being antitone forbids that the "execution progress" of L towards termination becomes infinitely small while still being positive.

*Example 2.* The SM-Rule can be used to prove AST for Figure 1a. Consider M := x, p:=1/2, d:= 1 and I :=true. Clearly, p and d are antitone. The remaining conditions of Theorem 2 also hold as (1) x > 0 =⇒ x > 0; (2) x decreases by d with probability p in every iteration; and (3) <sup>E</sup>(M<sup>i</sup>+1−M<sup>i</sup> |Fi)=xi−x<sup>i</sup> <sup>≤</sup>0.

# 3.3 Non-Termination

While Theorems 1 and 2 can be used for proving AST and PAST, respectively, they are not applicable to the analysis of non-terminating Prob-solvable loops. Two sufficient conditions for certifying the negations of AST and PAST have been introduced in [13] using so-called *repulsing-supermartingales*. Intuitively, a *repulsing-supermartingale* M on average decreases in every iteration of L and on termination is non-negative. Figuratively, M repulses terminating states.

Theorem 3 (Repulsing-AST-Rule (R-AST-Rule) [13]). *Let* <sup>M</sup> :R<sup>m</sup> <sup>→</sup><sup>R</sup> *be an expression over the program variables of* L *and* I *a pure invariant of* L*. Assume the following conditions hold for all* <sup>i</sup>∈N*:*


*Then,*L*is* not *AST.*M *is called an* -repulsing supermartingale with c-bounded differences*.*

*Example 3.* Consider Figure 1b and let M :=−x, c := 3, :=1/<sup>2</sup> and I :=true. All four above conditions hold: (1) <sup>−</sup>x<sup>0</sup> <sup>=</sup> <sup>−</sup><sup>10</sup> <sup>&</sup>lt; <sup>0</sup>; (2) <sup>x</sup> <sup>≤</sup> 0 =⇒ −<sup>x</sup> <sup>≥</sup> <sup>0</sup>; (3) <sup>E</sup>(M<sup>i</sup>+1 <sup>−</sup>M<sup>i</sup> <sup>|</sup> Fi)=−xi−<sup>1</sup>/2+x<sup>i</sup> =−<sup>1</sup>/<sup>2</sup>≤−; and (4) |xi−x<sup>i</sup>+1|<3. Thus, Figure 1b is not AST.

While Theorem 3 can prove programs not to be AST, and thus also not PAST, it cannot be used to prove programs not to be PAST when they are AST. For example, Theorem 3 cannot be used to prove that Figure 1a is not PAST. To address such cases, a variation of the R-AST-Rule [13] for certifying programs not to be PAST arises by relaxing the condition > 0 of the R-AST-Rule to ≥ 0. We refer to this variation by *Repulsing-PAST-Rule (R-PAST-Rule)*.

# 4 Relaxed Proof Rules for Probabilistic Termination

While Theorems 1-3 provide sufficient conditions proving PAST, AST and their negations, the applicability to Prob-solvable loops is somewhat restricted. For example, the RSM-Rule cannot be used to prove Figure 1d to be PAST using the simple expression M :=x, as explained in detail with Example 4, but may require more complex witnesses for certifying PAST, complicating automation. In this section, we relax the conditions of Theorems 1-3 by requiring these conditions to only hold "eventually". A property P(i) parameterized by a natural number <sup>i</sup>∈<sup>N</sup> *holds eventually* if there is an <sup>i</sup>0∈<sup>N</sup> such that <sup>P</sup>(i) holds for all i≥i0. Our relaxations of probabilistic termination proof rules can intuitively be described as follows: If L, after a fixed number of steps, almost surely reaches a state from which the program is PAST or AST, then the program is PAST or AST, respectively. Let us first illustrate the benefits of reasoning with "eventually" holding properties for probabilistic termination in the following example.

$$\begin{array}{l} \text{x := } x\_0, \text{y := 0} \\ \text{while } x > 0 \text{ do} \\ \begin{array}{l} \text{y := y + 1} \\ \text{x := x + (y - 5) \text{ [1/2] } x - (y - 5) \end{array} \\ \text{end} \end{array} \qquad \begin{array}{l} \text{x := 1, \text{y := 2}} \\ \text{while } x > 0 \text{ do} \\ \begin{array}{l} \text{y := 1/2 \cdot y} \\ \text{x := \text{x} + 1 - y \text{ [2/s] } \text{x} - 1 + y \text{ [3/s] } \text{y} \end{array} \\ \text{end} \end{array}$$
 
$$\text{(a)} \tag{b)}$$

Fig. 2: Prob-solvable loops which require our relaxed proof rules for termination analysis.

*Example 4 (Limits of the RSM-Rule and SM-Rule).* Consider Figure 1d. Setting M :=x, we have the martingale expression <sup>E</sup>(M<sup>i</sup>+1−M<sup>i</sup> |Fi)=−<sup>y</sup><sup>2</sup> <sup>i</sup>/2+yi+3/2=−<sup>i</sup> 2 /2+i+3/2. Since <sup>E</sup>(x<sup>i</sup>+1−x<sup>i</sup> | Fi) is non-negative for <sup>i</sup>∈ {0,1,2,3}, we conclude that <sup>M</sup> is not an RSM. However, Figure 1d either terminates within the first three iterations or, after three loop iterations, is in a state such that the RSM-Rule is applicable. Therefore, Figure 1d is PAST but the RSM-Rule cannot directly prove using M :=x. A similar restriction of the SM-Rule can be observed for Figure 2a. By considering M :=x, we derive the martingale expression <sup>E</sup>(x<sup>i</sup>+1−x<sup>i</sup> |Fi)=0, implying that <sup>M</sup> is a martingale for Figure 2a. However, the decrease function d for the SM-Rule cannot be defined because, for example, in the fifth loop iteration of Figure 2a, there is no progress as x is almost surely updated with its previous value. However, after the fifth iteration of Figure 2a, x always decreases by at least 1 with probability <sup>1</sup>/<sup>2</sup> and all conditions of the SM-Rule are satisfied. Thus, Figure 2a either terminates within the first five iterations or reaches a state from which it terminates almost surely. Consequently, Figure 2a is AST but the SM-Rule cannot directly prove it using M :=x.

We therefore relax the RSM-Rule and SM-Rule of Theorems 1 and 2 as follows.

Theorem 4 (Relaxed Termination Proof Rules). *For the RSM-Rule to certify PAST of* L*, it is sufficient that conditions* (1)*-*(2) *of Theorem 1 hold eventually (instead of for all* <sup>i</sup>∈N*). Similarly, for the SM-Rule to certify AST of* <sup>L</sup>*, it is sufficient that conditions* (1)*-*(3) *of Theorem 2 hold eventually.*

*Proof.* We prove the relaxation of the RSM-Rule. The proof of the relaxed SM-Rule is analogous. Let L:=I while G do U end be as in Definition 1. Assume L satisfies the conditions (1)-(2) of Theorem <sup>1</sup> after some <sup>i</sup>0∈N. We construct the following probabilistic program P, where i is a new variable not appearing in L:

$$\begin{array}{l} \mathcal{Z}; i := 0\\while \; i < i\_0 \; do \; \mathcal{U}; i := i + 1 \; end\\while \; \mathcal{G} \; do \; \mathcal{U} \; end \end{array} \tag{l}$$

We first argue that if P is PAST, then so is L. Assume P to be PAST. Then, the looping time of L is either bounded by i<sup>0</sup> or it is PAST, by the definition of P. In both cases, L is PAST. Finally, observe that P is PAST if and only if its second while-loop is PAST. However, the second while-loop of P can be certified to be PAST using the RSM-Rule and additionally using i≥i<sup>0</sup> as an invariant. " 

*Remark 2.* The central point of our proof rule relaxations is that they allow for simpler witnesses. While for Example 4 it can be checked that M := x+ 2<sup>y</sup>+5 is an RSM, the example illustrates that the relaxed proof rule allows for a much simpler PAST witness (linear instead of exponential). This simplicity is key for automation.

Similar to Theorem 4, we relax the R-AST-Rule and the R-PAST-Rule. However, compared to Theorem 4, it is not enough for a non-termination proof rule to certify non-AST from some state onward, because L may never reach this state as it might terminate earlier. Therefore, a necessary assumption when relaxing non-termination proof rules comes with ensuring that L has a positive probability of reaching the state after which a proof rule witnesses non-termination. This is illustrated in the following example .

*Example 5 (Limits of the R-AST-Rule).* Consider Figure 2b and set M :=−x. As a result, we get <sup>E</sup>(M<sup>i</sup>+1 <sup>−</sup>M<sup>i</sup> | Fi) = <sup>y</sup>i/<sup>6</sup>−<sup>1</sup>/<sup>3</sup> <sup>=</sup> <sup>2</sup>−<sup>i</sup> /<sup>3</sup>−<sup>1</sup>/3. Thus, <sup>E</sup>(M<sup>i</sup>+1 <sup>−</sup>M<sup>i</sup> | Fi)=0 for <sup>i</sup>= 0, implying that <sup>M</sup> cannot be an -repulsing supermartingale with ><sup>0</sup> for all <sup>i</sup>∈N. However, after the first iteration of L, M satisfies all requirements of the R-AST-Rule. Moreover, L always reaches the second iteration because in the first iteration x almost surely does not change. From this follows that Figure 2b is not AST.

The following theorem formalizes the observation of Example 5 relaxing the R-AST-Rule and R-PAST-Rule of Theorem 3.

Theorem 5 (Relaxed Non-Termination Proof Rules for). *For the R-AST-Rule to certify non-AST for* L *(Theorem 3), as well as for the R-PAST-Rule to certify non-PAST for* L *(Theorem 3), if* <sup>P</sup>(M<sup>i</sup><sup>0</sup> <sup>&</sup>lt;0)><sup>0</sup> *for some* <sup>i</sup><sup>0</sup> <sup>≥</sup>0*, it suffices that conditions* (2)*-*(4) *hold for all* <sup>i</sup>≥i<sup>0</sup> *(instead of for all* <sup>i</sup>∈N*).*

The proof of Theorem 5 is similar to the one of Theorem 4 and available in [40]. In what follows, whenever we write RSM-Rule, SM-Rule, R-AST-Rule or R-PAST-Rule we refer to our relaxed versions of the proof rules.

# 5 Algorithmic Termination Analysis through Asymptotic Bounds

The *two major challenges when automating reasoning* with the proof rules of Sections 3 and 4 are (i) constructing expressions M over the program variables and (ii) proving inequalities involving <sup>E</sup>(M<sup>i</sup>+1−M<sup>i</sup> |Fi). In this section, we address these two challenges for Prob-solvable loops. For the loop guard G<sup>L</sup> =P >Q, let G<sup>L</sup> denote the polynomial P −Q. As before, if L is clear from the context, we omit the subscript L. It holds that G>0 is equivalent to G.

*(i) Constructing (super)martingales* M*:* For a Prob-solvable loop L, the polynomial G is a natural candidate for the expression M in termination proof rules (RSM-Rule, SM-Rule) and −G in the non-termination proof rules (R-AST-Rule, R-PAST-Rule). Hence, we construct potential (super)martingales M by setting M :=G for the RSM-Rule and the SM-Rule, and M := −G for the R-AST-Rule and the R-PAST-Rule. The property G =⇒ G>0, a condition of the RSM-Rule and the SM-Rule, trivially holds. Moreover, for the R-AST-Rule and R-PAST-Rule the condition ¬G =⇒ −G ≥ 0 is satisfied. The remaining conditions of the proof rules are:


All these conditions express bounds over Gi. Choosing G as the potential witness may seem simplistic. However, Example 4 already illustrated how our relaxed proof rules can mitigate the need for more complex witnesses (even exponential ones). *The computational effort in our approach does not lie in synthesizing a complex witness but in constructing asymptotic bounds for the loop guard.* Our approach can therefore be seen as complementary to approaches synthesizing more complex witnesses [10,11,13]. The martingale expression <sup>E</sup>(G<sup>i</sup>+1−G<sup>i</sup> |Fi)is an expression over program variables, whereas <sup>G</sup><sup>i</sup>+1−G<sup>i</sup> cannot be interpreted as a single expression but through a distribution of expressions.

Definition 8 (One-step Distribution). *For expression* H *over the program variables of Prob-solvable loop*L*, let the* one-step distributionU<sup>H</sup> <sup>L</sup> *be defined by*<sup>E</sup> →P(H<sup>i</sup>+1 <sup>=</sup><sup>E</sup> |Fi) *with support set* supp(U<sup>H</sup> <sup>L</sup> ):={<sup>B</sup> |U<sup>H</sup> <sup>L</sup> (B)>0}*. We refer to expressions* <sup>B</sup> <sup>∈</sup>supp(U<sup>H</sup> L ) *by* branches of H*.*

The notation <sup>U</sup><sup>H</sup> <sup>L</sup> is chosen to suggest that the loop body <sup>U</sup><sup>L</sup> is "applied" to the expression <sup>H</sup>, leading to a distribution over expressions. Intuitively, the support supp(U<sup>H</sup> <sup>L</sup> ) of an expression H contains all possible updates of H after executing a single iteration of UL.

*(ii) Proving inequalities involving* <sup>E</sup>(M<sup>i</sup>+1−M<sup>i</sup> |Fi)*:* To automate the termination analysis of L with the proof rules from Section 3, we need to compute bounds for the expression <sup>E</sup>(G<sup>i</sup>+1−G<sup>i</sup> |Fi) as well as for the branches ofG. In addition, our relaxed proof rules from Section 4 only need asymptotic bounds, i.e. bounds which hold eventually. In Section 5.2, we propose Algorithm 1 for computing *asymptotic lower and upper bounds* for any polynomial expression over program variables of L. Our procedure allows us to derive bounds

for <sup>E</sup>(G<sup>i</sup>+1−G<sup>i</sup> |Fi) and the branches of <sup>G</sup>. Before formalizing our method, let us first illustrate how reasoning with asymptotic bounds helps to apply termination proof rules to L.

*Example 6 (Asymptotic Bounds for the RSM-Rule).* Consider the following program:

x := 1, y := 0 **while** x<<sup>100</sup> **do <sup>y</sup>** := **<sup>y</sup>**+1 **<sup>x</sup>** := <sup>2</sup>**x**+y<sup>2</sup> [1/2] <sup>1</sup>/2·**<sup>x</sup> end**

Observe <sup>y</sup><sup>i</sup> <sup>=</sup> <sup>i</sup>. The martingale expression for <sup>G</sup> = 100 <sup>−</sup> <sup>x</sup> is <sup>E</sup>(Gi+1 <sup>−</sup> <sup>G</sup><sup>i</sup> | Fi) = <sup>1</sup>/2(100−2xi−(i+1)<sup>2</sup>)+1/2(100−<sup>x</sup>i/2)−(100−xi) =−<sup>x</sup>i/<sup>4</sup>−<sup>i</sup> 2 /<sup>2</sup>−i−<sup>1</sup>/2. Note that if the term <sup>−</sup><sup>x</sup>i/<sup>4</sup> would not be present in <sup>E</sup>(G<sup>i</sup>+1−G<sup>i</sup> |Fi), we could certify the program to be PAST using the RSM-Rule because −<sup>i</sup> 2 /<sup>2</sup>−i−<sup>1</sup>/<sup>2</sup> ≤ −<sup>1</sup>/<sup>2</sup> for all i ≥ 0. However, by taking a closer look at the variable x, we observe that it is *eventually* and almost surely lower bounded by the function <sup>α</sup>· <sup>2</sup>−<sup>i</sup> for some <sup>α</sup> <sup>∈</sup> <sup>R</sup><sup>+</sup>. Therefore, *eventually* <sup>−</sup><sup>x</sup>i/<sup>4</sup>≤−β·2−<sup>i</sup> for some <sup>β</sup>∈R+. Thus, *eventually* <sup>E</sup>(G<sup>i</sup>+1−G<sup>i</sup> |Fi)≤−γ·<sup>i</sup> <sup>2</sup> for some <sup>γ</sup> <sup>∈</sup>R+. By our RSM-Rule, the program is PAST.

Now, the question arises how the asymptotic lower bound <sup>α</sup>·2−<sup>i</sup> for <sup>x</sup> can be computed automatically. In every iteration, <sup>x</sup> is either updated with <sup>2</sup>x+y<sup>2</sup> or <sup>1</sup>/<sup>2</sup>·x. Considering the updates as recurrences, we have the inhomogeneous parts y<sup>2</sup> and 0. Asymptotic lower bounds for these parts are i <sup>2</sup> and 0, respectively, where 0 is the "asymptotically smallest one". Taking 0 as the inhomogeneous part, we construct two recurrences: (1)l<sup>0</sup> =α, l<sup>i</sup>+1 = <sup>2</sup>li+0 and (2) <sup>l</sup><sup>0</sup> <sup>=</sup>α, l<sup>i</sup>+1 <sup>=</sup>1/<sup>2</sup>·li+0, for some <sup>α</sup>∈R+. Solutions to these recurrences are <sup>α</sup>·2<sup>i</sup> and <sup>α</sup>·2−<sup>i</sup> , where the last one is the desired lower bound because it is "asymptotically smaller". We will formalize this idea of computing asymptotic bounds in Algorithm 1.

We next present our method for computing asymptotic bounds over martingale expressions in Sections 5.1-5.2. Based on these asymptotic bounds, in Section 5.3 we introduce algorithmic approaches for our proof rules from Section 4, solving our aforementioned challenges (i)-(ii) in a fully automated manner (Section 5.4).

### 5.1 Prob-solvable Loops and Monomials

Algorithm 1 computes asymptotic bounds on monomials over program variables in a recursive manner. To ensure termination of Algorithm 1, it is important that there are no circular dependencies among monomials. By the definition of Prob-solvable loops, this indeed holds for program variables (monomials of order 1). Every Prob-solvable loop L comes with an ordering on its variables and every variable is restricted to only depend linearly on itself and polynomially on previous variables. Acyclic dependencies naturally extend from single variables to monomials.

Definition 9 (Monomial Ordering). *Let* L *be a Prob-solvable loop with variables* x(1),...,x(m)*. Let* y<sup>1</sup> =/<sup>m</sup> j=1x<sup>p</sup><sup>j</sup> (j) *and* <sup>y</sup><sup>2</sup> <sup>=</sup>/<sup>m</sup> j=1x<sup>q</sup><sup>j</sup> (j)*, where* <sup>p</sup><sup>j</sup> ,q<sup>j</sup> <sup>∈</sup>N*, be two monomials over the program variables. The* order 2 on monomials *over the program variables of* L *is defined by* y<sup>1</sup> 2 y<sup>2</sup> ⇐⇒ (pm,...,p1) ≤lex (qm,...,q1)*, where* ≤lex *is the lexicographic order on* <sup>N</sup><sup>m</sup>*. The order* <sup>2</sup> *is total because* <sup>≤</sup>lex *is total. With* <sup>y</sup><sup>1</sup> <sup>≺</sup>y<sup>2</sup> *we denote* y<sup>1</sup> 2y2∧y<sup>1</sup> =y2*.*

To prove acyclic dependencies for monomials we exploit the following fact.

Lemma 1. *Let* y1,y2,z1,z<sup>2</sup> *be monomials. If* y<sup>1</sup> 2z<sup>1</sup> *and* y<sup>2</sup> 2z<sup>2</sup> *then* y<sup>1</sup> ·y<sup>2</sup> 2z<sup>1</sup> ·z2*.*

By structural induction over monomials and Lemma 1, we establish:

Lemma 2 (Monomial Acyclic Dependency). *Let* x *be a monomial over the program variables of* <sup>L</sup>*. For every branch* <sup>B</sup> <sup>∈</sup>supp(U<sup>x</sup> <sup>L</sup>) *and monomial* <sup>y</sup> *in* <sup>B</sup>*,* <sup>y</sup>2<sup>x</sup> *holds.*

Lemma 2 states that the value of a monomial x over the program variables of L only depends on the value of monomials y which precede x in the monomial ordering 2. This ensures the dependencies among monomials over the program variables of L to be acyclic.

# 5.2 Computing Asymptotic Bounds for Prob-solvable Loops

The structural result on monomial dependencies from Lemma 2 allows for recursive procedures over monomials. This is exploited in Algorithm 1 for computing asymptotic bounds for monomials. The standard Big-O notation does not differentiate between positive and negative functions, as it considers the absolute value of functions. We, however, need to differentiate between functions like <sup>2</sup><sup>i</sup> and <sup>−</sup>2<sup>i</sup> . Therefore, we introduce the notions of *Domination* and *Bounding Functions*.

Definition 10 (Domination). *Let* F *be a finite set of functions from* N *to* R*. A function* <sup>g</sup> :N→<sup>R</sup> *is* dominating <sup>F</sup> *if eventually* <sup>α</sup>·g(i)≥f(i) *for all* <sup>f</sup> <sup>∈</sup><sup>F</sup> *and some* <sup>α</sup>∈R<sup>+</sup>*. A function* <sup>g</sup> :N→<sup>R</sup> *is* dominated by <sup>F</sup> *if all* <sup>f</sup> <sup>∈</sup><sup>F</sup> *dominate* {g}*.*

Intuitively, a function f dominates a function g if f eventually surpasses g modulo a positive constant factor. *Exponential polynomials* are sums of products of polynomials with exponential functions, i.e. <sup>j</sup> <sup>p</sup><sup>j</sup> (x)· <sup>c</sup><sup>x</sup> <sup>j</sup> , where <sup>c</sup><sup>j</sup> <sup>∈</sup> <sup>R</sup><sup>+</sup> <sup>0</sup> . All functions arising in Algorithms 1-4 are exponential polynomials. For a finite set F of exponential polynomials, a function dominating F and a function dominated by F are easily computable with standard techniques, by analyzing the terms of the functions in the finite set F. With dominating(F) we denote an algorithm computing an exponential polynomial dominatingF. With dominated(F) we denote an algorithm computing an exponential polynomial dominated by F. We assume the functions returned by the algorithms dominating(F) and dominated(F) to be monotone and either non-negative or non-positive.

*Example 7 (Domination).* The following statements are true: 0 dominates {−i <sup>3</sup>+i <sup>2</sup>+5}, i <sup>2</sup> dominates {2<sup>i</sup> <sup>2</sup>}, <sup>i</sup> <sup>2</sup> · <sup>2</sup><sup>i</sup> dominates {<sup>i</sup> <sup>2</sup> · <sup>2</sup><sup>i</sup> <sup>+</sup> <sup>i</sup> <sup>9</sup>, i<sup>5</sup> + i <sup>3</sup>, 2−<sup>i</sup> }, i is dominated by {i <sup>2</sup>−2i+1, <sup>1</sup> <sup>2</sup> <sup>i</sup>−5} and <sup>−</sup>2<sup>i</sup> is dominated by {2<sup>i</sup>−<sup>i</sup> <sup>2</sup>,−10·2−<sup>i</sup> }.

Definition 11 (Bounding Function for L). *Let* E *be an arithmetic expression over the program variables of* <sup>L</sup>*. Let* l,u:N→<sup>R</sup> *be monotone and non-negative or non-positive.*


A bounding function imposes a bound on an expression E over the program variables holding eventually, almost surely, and modulo a positive constant factor. Moreover, bounds on E only need to hold as long as the program has not yet terminated.

Given a Prob-solvable loop L and a monomial x over the program variables of L, Algorithm 1 computes a lower and upper bounding function for x. Because every polynomial expression is a linear combination of monomials, the procedure can be used to compute lower and upper bounding functions for any polynomial expression over L's program variables by substituting every monomial with its lower or upper bounding function depending on the sign of the monomial's coefficient. Once a lower bounding function l and an upper bounding function u are computed, an absolute bounding function can be computed by dominating({u,−l}).

In Algorithm 1, candidates for bounding functions are modeled using recurrence relations. Solutions s(i) of these recurrences are closed-form candidates for bounding functions parameterized by loop iteration i. Algorithm 1 relies on the existence of closedform solutions of recurrences. While closed-forms of general recurrences do not always exist, a property of *C-finite recurrences*, linear recurrences with constant coefficients, is that their closed-forms always exist and are computable [32]. In all occurring recurrences, we consider a monomial over program variables as a single function. Therefore, throughout this section, all recurrences arising from a Prob-solvable loop L in Algorithm 1 are C-finite or can be turned into C-finite recurrences. Moreover, closed-forms s(i) of C-finite recurrences are given by exponential polynomials. Therefore, for any solution s(i) to a C-finite recurrence and any constant <sup>r</sup>∈R, the following holds:

$$
\exists \alpha, \beta \in \mathbb{R}^+, \exists i\_0 \in \mathbb{N} : \forall i \ge i\_0 : \alpha \cdot s(i) \le s(i+r) \le \beta \cdot s(i). \tag{2}
$$

Intuitively, the property states that constant shifts do not change the asymptotic behavior of s. We use this property at various proof steps in this section. Moreover, we recall that limits of exponential polynomials are computable [23].

For every monomial <sup>x</sup>, every branch <sup>B</sup> <sup>∈</sup> supp(U<sup>x</sup> <sup>L</sup>) is a polynomial over the program variables. Let Rec(x) := {coefficient of <sup>x</sup> in <sup>B</sup> <sup>|</sup> <sup>B</sup> <sup>∈</sup> supp(U<sup>x</sup> <sup>L</sup>)} denote the set of coefficients of the monomial x in all branches of L. Let Inhom(x) := {B − c · x | <sup>B</sup> <sup>∈</sup>supp(U<sup>x</sup> <sup>L</sup>) and <sup>c</sup><sup>=</sup> coefficient of <sup>x</sup> in <sup>B</sup>} denote all the branches of the monomial <sup>x</sup> without x and its coefficient. The symbolic constants c<sup>1</sup> and c<sup>2</sup> in Algorithm 1 represent arbitrary initial values of the monomial x for which bounding functions are computed. The fact that they are symbolic ensures that all potential initial values are accounted for. c<sup>1</sup> represents positive initial values and −c<sup>2</sup> negative initial values. The symbolic constant d is used in the recurrences to account for the fact that the bounding functions only hold modulo a constant. Intuitively, if we use the bounding function in a recurrence we need to restore the lost constant. Sign(x) is an over-approximation of the sign of the monomial <sup>x</sup>, i.e., if <sup>∃</sup>i:P(x<sup>i</sup> <sup>&</sup>gt;0)>0, then <sup>+</sup>∈Sign(x) and if <sup>∃</sup>i:P(x<sup>i</sup> <sup>&</sup>lt;0)>0, then −∈Sign(x).

Lemma 2, the computability of closed-forms of C-finite recurrences and the fact that within a Prob-solvable loop only finitely many monomials can occur, implies the termination of Algorithm 1. Its correctness is stated in the next theorem.

Theorem 6 (Correctness of Algorithm 1). *The functions* l(i),u(i) *returned by Algorithm 1 on input* L *and* x *are a lower- and an upper bounding function for* x*, respectively.*


Input: A Prob-solvable loop L and a monomial x over L's variables Output: Lower and upper bounding functions l(i), u(i) for x inhomBoundsUpper :={upper bounding function of P |P ∈Inhom(x)} (recursive call) inhomBoundsLower :={lower bounding function of P |P ∈Inhom(x)} (recursive call) U(i):=dominating(inhomBoundsUpper ) L(i):=dominated(inhomBoundsLower ) maxRec :=maxRec(x) minRec :=minRec(x) <sup>7</sup> I :=∅ if +∈Sign(x) then I :=I∪{c1} ; if −∈Sign(x) then I :=I∪{−c2} ; uCand := closed-forms of {y<sup>i</sup>+1 =r·yi+d·U(i)|r∈ {minRec,maxRec},y0∈I} lCand := closed-forms of {y<sup>i</sup>+1 =r·yi+d·L(i)|r∈ {minRec,maxRec},y0∈I} u(i):=dominating(uCand) l(i):=dominated(lCand) return l(i),u(i)

*Proof.* Intuitively, it has to be shown that regardless of the paths through the loop body taken by any program run, the value of x is always eventually upper bounded by some function in uCand and eventually lower bounded by some function in lCand (almost surely and modulo positive constant factors). We show that x is always eventually upper bounded by some function in uCand. The proof for the lower bounding function is analogous.

Let <sup>ϑ</sup>∈<sup>Σ</sup> be a *possible* program run, i.e. <sup>P</sup>(Cyl(π))><sup>0</sup> for all finite prefixes <sup>π</sup> of <sup>ϑ</sup>. Then, for every <sup>i</sup>∈N, if <sup>T</sup> ¬G(ϑ)> i, the following holds:

$$\begin{aligned} x\_{i+1}(\vartheta) &= a\_{(1)} \cdot x\_i(\vartheta) + P\_{(1)i}(\vartheta) \text{ or } x\_{i+1}(\vartheta) = a\_{(2)} \cdot x\_i(\vartheta) + P\_{(2)i}(\vartheta), \\ &\quad \text{or} \dots \text{ or } x\_{i+1}(\vartheta) = a\_{(k)} \cdot x\_i(\vartheta) + P\_{(k)i}(\vartheta), \end{aligned}$$

where a(j) ∈ Rec(x) and P(j) ∈ Inhom(x) are polynomials over program variables. Let u1(i),...,uk(i) be upper bounding functions of P(1),...,P(k), which are computed recursively at line 10. Moreover, let U(i):=dominating({u1(i),...,uk(i)}), minRec = minRec(x) and maxRec =maxRec(x). Let <sup>l</sup>0∈<sup>N</sup> be the smallest number such that for all j ∈ {1,...,k} and i≥l0:

$$\mathbb{P}(P\_{(j)i} \le \alpha\_j \cdot u\_j(i) \, | \, T^{\neg \mathcal{G}} > i) = 1 \text{ for some } \alpha\_j \in \mathbb{R}^+\text{, and} \tag{3}$$

$$u\_j(i) \le \beta \cdot U(i) \text{ for some } \beta \in \mathbb{R}^+ \tag{4}$$

Thus, all inequalities from the bounding functions u<sup>j</sup> and the dominating function U hold from l<sup>0</sup> onward. Because U is a dominating function, it is by definition either non-negative or non-positive. AssumeU(i)to be non-negative, the case for whichU(i)is non-positive is symmetric. Using the facts(3) and (4), we establish: For the constant γ :=β·max<sup>j</sup>=1..kα<sup>j</sup> , it holds that <sup>P</sup>(P(j)<sup>i</sup> <sup>≤</sup>γ·U(i)|<sup>T</sup> ¬G > i)=1 for all <sup>j</sup> ∈ {1,...,k} and all <sup>i</sup>≥l0. Let <sup>l</sup><sup>1</sup> be the smallest number such that <sup>l</sup><sup>1</sup> <sup>≥</sup>l<sup>0</sup> and <sup>U</sup>(i+l0)≤δ·U(i) for all <sup>i</sup>≥l<sup>1</sup> and some <sup>δ</sup>∈R<sup>+</sup>.

*Case 1,* x<sup>i</sup> *is almost surely negative for all* i ≥ l1*:* Consider the recurrence relation y<sup>0</sup> = m, y<sup>i</sup>+1 = minRec · y<sup>i</sup> + η ·U(i), where η := max(γ,δ) and m is the maximum value of x<sup>l</sup><sup>1</sup> (ϑ) among all possible program runs ϑ. Note that m exists because there are only finitely many values x<sup>l</sup><sup>1</sup> (ϑ) for possible program runs ϑ. Moreover, m is negative by our case assumption. By induction, we get <sup>P</sup>(x<sup>i</sup> <sup>≤</sup> <sup>y</sup><sup>i</sup>−l<sup>1</sup> <sup>|</sup> <sup>T</sup> ¬G > i)=1 for all i ≥ l1. Therefore, for a closed-form solution s(i) of the recurrence relation yi, we get <sup>P</sup>(x<sup>i</sup> <sup>≤</sup>s(i−l1)|<sup>T</sup> ¬G > i)=1 for all <sup>i</sup>≥l1. We emphasize that <sup>s</sup> exists and can effectively be computed because y<sup>i</sup> is C-finite. Moreover, s(i−l1) ≤ θ ·s(i) for all i ≥ l<sup>2</sup> for some <sup>l</sup><sup>2</sup> <sup>≥</sup>l<sup>1</sup> and some <sup>θ</sup>∈R+. Therefore, <sup>s</sup> satisfies the bound condition of an upper bounding function. Also, s is present in uCand by choosing the symbolic constants c<sup>2</sup> and d to represent −m and η respectively. The function u(i):=dominating(uCand), at line 12, is dominating uCand (hence also s), is monotone and either non-positive or non-negative. Therefore, u(i) is an upper bounding function for x.

*Case 2,* x<sup>i</sup> *is not almost surely negative for all* i ≥ l1*:* Thus, there is a possible program run ϑ such that xi(ϑ ) ≥ 0 for some i ≥ l1. Let l<sup>2</sup> ≥ l<sup>1</sup> be the smallest number such that <sup>x</sup><sup>l</sup><sup>2</sup> (ϑˆ) <sup>≥</sup> <sup>0</sup> for some possible program run <sup>ϑ</sup>ˆ. This number certainly exists, as xi(ϑ ) is non-negative for some i ≥ l1. Consider the recurrence relation y<sup>0</sup> = m, y<sup>i</sup>+1 = maxRec · y<sup>i</sup> +η ·U(i), where η := max(γ,δ) and m is the maximum value of x<sup>l</sup><sup>2</sup> (ϑ) among all possible program runsϑ. Note thatmexists because there are only finitely many values x<sup>l</sup><sup>2</sup> (ϑ) for possible program runs ϑ. Moreover, m is non-negative because <sup>m</sup>≥x<sup>l</sup><sup>2</sup> (ϑˆ)≥0. By induction, we get <sup>P</sup>(x<sup>i</sup> <sup>≤</sup>y<sup>i</sup>−l<sup>2</sup> <sup>|</sup><sup>T</sup> ¬G > i)=1 for all <sup>i</sup>≥l2. Therefore, for a solution <sup>s</sup>(i) of the recurrence relation <sup>y</sup>i, we get <sup>P</sup>(x<sup>i</sup> <sup>≤</sup>s(i−l2)|<sup>T</sup> ¬G > i)=1 for all i≥l2. As above, s exists and can effectively be computed because y<sup>i</sup> is C-finite. Moreover, <sup>s</sup>(i−l2)≤θ·s(i) for all <sup>i</sup>≥l<sup>3</sup> for some <sup>l</sup><sup>3</sup> <sup>≥</sup>l<sup>2</sup> and some <sup>θ</sup>∈R+. Therefore, <sup>s</sup> satisfies the bound condition of an upper bounding function Also, s is present in uCand by choosing the symbolic constants c<sup>1</sup> and d to represent m and η respectively. The function u(i):= dominating(uCand), at line 12, is dominating uCand (hence also s), is monotone and either non-positive or non-negative. Therefore, u(i)is an upper bounding function for x. " 

*Example 8 (Bounding functions).* We illustrate Algorithm 1 by computing bounding functions for <sup>x</sup> and the Prob-solvable loop from Example 6: We have Rec(x):={2, <sup>1</sup> <sup>2</sup> } and Inhom(x) = {y<sup>2</sup>,0}. Computing bounding functions recursively for <sup>P</sup> <sup>∈</sup> Inhom(x) = {y<sup>2</sup>,0} is simple, as we can give exact bounds leading to inhomBoundsUpper <sup>=</sup>{<sup>i</sup> <sup>2</sup>,0} and inhomBoundsLower ={i <sup>2</sup>,0}. Consequently, we getU(i)=<sup>i</sup> <sup>2</sup>,L(i)=0, maxRec = 2 and minRec = <sup>1</sup> <sup>2</sup> . With a rudimentary static analysis of the loop, we determine the (exact) over-approximation Sign(x):={+} by observing that x<sup>0</sup> >0 and all P ∈Inhom(x) are strictly positive. Therefore, uCand is the set of closed-form solutions of the recurrences y<sup>0</sup> := c1, y<sup>i</sup>+1 := 2y<sup>i</sup> + d · i <sup>2</sup> and y<sup>0</sup> := c1, y<sup>i</sup>+1 := <sup>1</sup> <sup>2</sup> y<sup>i</sup> + d · i <sup>2</sup>. Similarly, lCand is the set of closed-form solutions of the recurrences y<sup>0</sup> := c1, y<sup>i</sup>+1 := 2y<sup>i</sup> and y<sup>0</sup> := c1, y<sup>i</sup>+1 := <sup>1</sup> <sup>2</sup> yi. Using any algorithm for computing closed-forms of C-finite recurrences, we obtain uCand <sup>=</sup> {c12<sup>i</sup>−di<sup>2</sup>−2di+3d2<sup>i</sup>−3d, c12−<sup>i</sup>+2di<sup>2</sup>−8di−12d2−<sup>i</sup>+12d} and lCand <sup>=</sup>{c12<sup>i</sup> , c12−<sup>i</sup> }. This leads to the upper bounding function <sup>u</sup>(i)=2<sup>i</sup> and the lower bounding function l(i)=2−<sup>i</sup> . The bounding functions l(i) and u(i) can be used to compute bounding functions for expressions containing x linearly by replacing x by l(i) or u(i) depending on the sign of the coefficient of x. For instance, eventually and almost surely the following inequality holds: <sup>−</sup><sup>x</sup><sup>i</sup> <sup>4</sup> <sup>−</sup> <sup>i</sup> 2 <sup>2</sup> <sup>−</sup>i<sup>−</sup> <sup>1</sup> <sup>2</sup> ≤ −<sup>1</sup> <sup>4</sup> ·α· <sup>2</sup>−<sup>i</sup> <sup>−</sup> <sup>i</sup> 2 <sup>2</sup> <sup>−</sup>i<sup>−</sup> <sup>1</sup> <sup>2</sup> for some <sup>α</sup>∈R<sup>+</sup>. The inequality results from replacing <sup>x</sup><sup>i</sup> by <sup>l</sup>(i). Therefore, eventually and

almost surely <sup>−</sup><sup>x</sup><sup>i</sup> <sup>4</sup> <sup>−</sup> <sup>i</sup> 2 <sup>2</sup> <sup>−</sup>i<sup>−</sup> <sup>1</sup> <sup>2</sup> ≤−β·i <sup>2</sup> for some <sup>β</sup>∈R+. Thus, <sup>−</sup><sup>i</sup> <sup>2</sup> is an upper bounding function for the expression <sup>−</sup><sup>x</sup><sup>i</sup> <sup>4</sup> <sup>−</sup> <sup>i</sup> 2 <sup>2</sup> <sup>−</sup>i<sup>−</sup> <sup>1</sup> 2 .

*Remark 3.* Algorithm 1 describes a general procedure computing bounding functions for special sequences. Figuratively, that is for sequences s such that s<sup>i</sup>+1 =f(si,i) but in every step the function f is chosen non-deterministically among a fixed set of special functions (corresponding to branches in our case). We reserve the investigation of applications of bounding functions for such sequences beyond the probabilistic setting for future work.

### 5.3 Algorithms for Termination Analysis of Prob-solvable Loops

Using Algorithm 1 to compute bounding functions for polynomial expressions over program variables at hand, we are now able to formalize our algorithmic approaches automating the termination analysis of Prob-solvable loops using the proof rules from Section 4. Given a Prob-solvable loop L and a polynomial expression E over L's variables, we denote with lbf (E), ubf (E) and abf (E) functions computing a lower, upper and absolute bounding function for E respectively. Our algorithmic approach for proving PAST using the RSM-Rule is given in Algorithm 2.

Algorithm 2: Ranking-Supermartingale-Rule for proving PAST Input: Prob-solvable loop L Output: If *true* then L with G satisfies the RSM-Rule; hence L is PAST <sup>1</sup> <sup>E</sup> :=E(G<sup>i</sup>+1−G<sup>i</sup> |Fi) <sup>2</sup> u(i):=ubf (E) <sup>3</sup> limit :=lim<sup>i</sup>→∞u(i) <sup>4</sup> return limit <0

*Example 9 (Algorithm 2).* Let us illustrate Algorithm 2 with the Prob-solvable loop from Examples <sup>6</sup> and 8. Applying Algorithm <sup>2</sup> on <sup>L</sup> leads to <sup>E</sup> <sup>=</sup>−<sup>x</sup><sup>i</sup> <sup>4</sup> <sup>−</sup> <sup>i</sup> 2 <sup>2</sup> <sup>−</sup>i<sup>−</sup> <sup>1</sup> <sup>2</sup> . We obtain the upper bounding function u(i):=−i <sup>2</sup> for <sup>E</sup>. Because lim<sup>i</sup>→∞u(i)<0, Algorithm <sup>2</sup> returns true. This is valid because u(i) having a negative limit witnesses that E is eventually bounded by a negative constant and therefore is eventually an RSM.

We recall that all functions arising from Lare exponential polynomials (see Section 5.2) and that limits of exponential polynomials are computable [23]. Therefore, the termination of Algorithm 2 is guaranteed and its correctness is stated next.

Theorem 7 (Correctness of Algorithm 2). *If Algorithm 2 returns* true *on input* L*, then* L *with* G<sup>L</sup> *satisfies the RSM-Rule.*

*Proof.* When returning *true* at line <sup>4</sup> we have <sup>P</sup>(E<sup>i</sup> <sup>≤</sup> <sup>α</sup> · <sup>u</sup>(i) <sup>|</sup> <sup>T</sup> ¬G > i)=1 for all <sup>i</sup> <sup>≥</sup> <sup>i</sup><sup>0</sup> and some <sup>i</sup><sup>0</sup> <sup>∈</sup> <sup>N</sup>, <sup>α</sup> <sup>∈</sup> <sup>R</sup><sup>+</sup>. Moreover, <sup>u</sup>(i) <sup>&</sup>lt; <sup>−</sup> for all <sup>i</sup> <sup>≥</sup> <sup>i</sup><sup>1</sup> for some <sup>i</sup><sup>1</sup> <sup>∈</sup> <sup>N</sup>, by the definition of lim. From this follows that <sup>∀</sup><sup>i</sup> <sup>≥</sup> max(i0,i1) almost surely <sup>G</sup><sup>i</sup> <sup>=</sup><sup>⇒</sup> <sup>E</sup>(G<sup>i</sup>+1−G<sup>i</sup> |Fi)≤−α·, which means <sup>G</sup> is eventually an RSM. " 

Our approach proving AST using the SM-Rule is captured with Algorithm 3.

# Algorithm 3: Supermartingale-Rule for proving AST

Input: Prob-solvable loop L Output: If *true*, L with G satisfies the SM-Rule with constant d and p; hence L is AST <sup>E</sup> :=E(G<sup>i</sup>+1−G<sup>i</sup> |Fi) u(i):=ubf (E) if *not eventually* u(i)≤0 then return false ; for <sup>B</sup> <sup>∈</sup>supp(U<sup>G</sup> <sup>L</sup> ) do d(i):=ubf (B−G) limit :=lim<sup>i</sup>→∞d(i) if limit <0 then return true ;

<sup>8</sup> end

```
9 return false
```
*Example 10 (Algorithm 3).* Let us illustrate Algorithm 3 for the Prob-solvable loop L from Figure 2a: Applying Algorithm 3 on L yields E ≡0 and u(i)=0. The expression G (= x) has two branches. One of them is x<sup>i</sup> − y<sup>i</sup> + 4, which occurs with probability <sup>1</sup>/2. When the for-loop of Algorithm 3 reaches this branch B = x<sup>i</sup> −y<sup>i</sup> + 4 on line 4, it computes the difference B−G=−yi+4. An upper bounding function for B−G is given by d(i)=−i. Because lim<sup>i</sup>→∞d(i)<0, Algorithm 3 returns true. This is valid because of the branch B witnessing that G eventually decreases by at least a constant with probability <sup>1</sup>/2. Therefore, all conditions of the SM-Rule are satisfied and L is AST.

Theorem 8 (Correctness of Algorithm 3). *If Algorithm 3 returns* true *on input* L*, then* L *with* G<sup>L</sup> *satisfies the SM-Rule with constant* d *and* p*.*

The proof of Theorem 8, as well as of Theorem 9, are similar to the one of Theorem 7 and can be found in [40].

As established in Section 4, the relaxation of the R-AST-Rule requires that there is a positive probability of reaching the iteration i<sup>0</sup> after which the conditions of the proof rule hold. Regarding automation, we strengthen this condition by ensuring that there is a positive probability of reaching any iteration, i.e. <sup>∀</sup>i∈N:P(Gi)>0. Obviously, this implies <sup>P</sup>(G<sup>i</sup><sup>0</sup> )>0. Furthermore, with CanReachAnyIteration(L) we denote a computable under-approximation of <sup>∀</sup><sup>i</sup> <sup>∈</sup> <sup>N</sup> : <sup>P</sup>(Gi) <sup>&</sup>gt; <sup>0</sup>. That means, CanReachAnyIteration(L) implies <sup>∀</sup>i∈N:P(Gi)>0. Our approach proving non-AST is summarized in Algorithm 4.

*Example 11 (Algorithm 4).* Let us illustrate Algorithm 4 for the Prob-solvable loop L from Figure 2a: Applying Algorithm <sup>4</sup> on <sup>L</sup> leads to <sup>E</sup> <sup>=</sup> <sup>y</sup><sup>i</sup> <sup>6</sup> <sup>−</sup> <sup>1</sup> <sup>3</sup> <sup>=</sup> <sup>2</sup>−<sup>i</sup> <sup>3</sup> <sup>−</sup> <sup>1</sup> <sup>3</sup> and to the upper bounding function u(i)=−1 for E on line 2. Therefore, the if-statement on line 3 is not executed, which means −G is eventually a -repulsing supermartingale. Moreover, with a simple static analysis of the loop, we establish CanReachAnyIteration(L) to be true, as there is a positive probability that the loop guard does not decrease. Thus, the if-statement on line 4 is not executed. Also, the if-statement on line 6 is not executed, because (i) = −u(i)=1 is constant and therefore in Ω(1). E eventually decreases by = 1 (modulo a positive constant factor), because u(i)=−1 is an upper bounding function for <sup>E</sup>. We have differences <sup>=</sup>{1<sup>−</sup> <sup>y</sup><sup>i</sup> <sup>2</sup> ,1+ <sup>y</sup><sup>i</sup> <sup>2</sup> }. Both expressions in differences have an absolute bounding function of 1. Therefore, diffBounds ={1}. As a result on line 9 we have c(i)=1, which eventually and almost surely is an upper bound on |−G<sup>i</sup>+1+Gi|

Algorithm 4: Repulsing-AST-Rule for proving non-AST

Input: Prob-solvable loop L Output: if *true*, L with −G satisfies the R-AST-Rule; hence L is not AST <sup>E</sup> :=E(−G<sup>i</sup>+1+G<sup>i</sup> |Fi) u(i):=ubf (E) if *not eventually* u(i)≤0 then return false ; if ¬CanReachAnyIteration(L) then return false ; (i):=−u(i) if (i)∈Ω(1) then return false ; differences :={B+G|<sup>B</sup> <sup>∈</sup>supp(U<sup>−</sup><sup>G</sup> <sup>L</sup> )} diffBounds :={abf (d)|d∈differences} c(i):=dominating(diffBounds) return c(i)∈O(1)

(modulo a positive constant factor). Therefore, the algorithm returns true. This is correct, as all the preconditions of the R-AST-Rule are satisfied (and therefore L is not AST).

Theorem 9 (Correctness of Algorithm 4). *If Algorithm 4 returns* true *on input* L*, then* L *with* −G<sup>L</sup> *satisfies the R-AST-Rule.*

Because the R-PAST-Rule is a slight variation of the R-AST-Rule, Algorithm 4 can be slightly modified to yield a procedure for the R-PAST-Rule. An algorithm for the R-PAST-Rule is provided in [40].

# 5.4 Ruling out Proof Rules for Prob-Solvable Loops

A question arising when combining our algorithmic approaches from Section 5.3 into a unifying framework is that, given a Prob-solvable loop L, what algorithm to apply first for determining L's termination behavior? In [4] the authors provide an algorithm for computing an algebraically closed-form of <sup>E</sup>(Mi), where <sup>M</sup> is a polynomial over <sup>L</sup>'s variables. The following lemma explains how the expression <sup>E</sup>(M<sup>i</sup>+1−Mi) relates to the expression <sup>E</sup>(M<sup>i</sup>+1−M<sup>i</sup> |Fi). The lemma follows from the monotonicity of <sup>E</sup>.

Lemma 3 (Rule out Rules forL). *Let*(Mi)<sup>i</sup>∈<sup>N</sup> *be a stochastic process. If*E(M<sup>i</sup>+1−M<sup>i</sup> <sup>|</sup> <sup>F</sup>i)≤− *then* <sup>E</sup>(M<sup>i</sup>+1−Mi)≤−*, for any* ∈R<sup>+</sup>*.*

The contrapositive of Lemma 3 provides a criterion to rule out the viability of a given proof rule. For a Prob-solvable loop <sup>L</sup>, if <sup>E</sup>(G<sup>i</sup>+1−Gi)≤<sup>0</sup> then <sup>E</sup>(G<sup>i</sup>+1−G<sup>i</sup> | Fi)≤0, meaning <sup>G</sup> is not a supermartingale. The expression <sup>E</sup>(G<sup>i</sup>+1−Gi) depends only on <sup>i</sup> and can be computed by <sup>E</sup>(G<sup>i</sup>+1−Gi)=E(G<sup>i</sup>+1)−E(Gi), where the expected value <sup>E</sup>(Gi) is computed as in [4]. Therefore, in some cases, proof rules can automatically be deemed nonviable, without the need to compute bounding functions.

# 6 Implementation and Evaluation

### 6.1 Implementation

We implemented and combined our algorithmic approaches from Section 5 in the new software tool AMBER to stand for *Asymptotic Martingale Bounds*. AMBER and all benchmarks are available at https://github.com/probing-lab/amber. AMBER uses MORA [4][6] for computing the first-order moments of program variables and the DIOFANT package<sup>5</sup> as its computer algebra system.

*Computing* dominating *and* dominated The dominating and dominated procedures used in Algorithms 1 and 4 are implemented by combining standard algorithms for Big-O analysis and bookkeeping of the asymptotic polarity of the input functions. Let us illustrate this. Consider the following two input-output-pairs which our implementation would produce: (a) dominating({i <sup>2</sup>+10,10·<sup>i</sup> <sup>5</sup>−<sup>i</sup> <sup>3</sup>})=<sup>i</sup> <sup>5</sup> and (b) dominating({−i+50,−<sup>i</sup> <sup>8</sup>+ i <sup>2</sup>−3·<sup>i</sup> <sup>3</sup>})=−i. For (a) <sup>i</sup> <sup>5</sup> is eventually greater than all functions in the input set modulo a constant factor because all functions in the input set are O(i <sup>5</sup>). Therefore, i <sup>5</sup> dominates the input set. For (b), the first function is O(i) and the second is O(i <sup>8</sup>). In this case, however, both functions are eventually negative. Therefore, −i is a function dominating the input set. Important is the fact that an exponential polynomial <sup>j</sup>p<sup>j</sup> (i)·c<sup>i</sup> <sup>j</sup> , where <sup>c</sup><sup>j</sup> <sup>∈</sup>R<sup>+</sup> <sup>0</sup> will always be eventually either only positive or only negative (or 0 if identical to 0).

*Sign Over-Approximation* The over-approximation Sign(x) of the signs of a monomial x used in Algorithm 1 is implemented by a simple static analysis: For a monomial x consisting solely of even powers, Sign(x)={+}. For a general monomial x, if x<sup>0</sup> ≥0 and all monomials on which x depends, together with their associated coefficients are always positive, then − ∈ Sign(x). For example, if supp(U<sup>x</sup> <sup>L</sup>) = {x<sup>i</sup> + 2y<sup>i</sup> <sup>−</sup>3zi,x<sup>i</sup> <sup>+</sup>ui}, then −∈Sign(x) if x<sup>0</sup> ≥0 as well as −∈Sign(y), +∈Sign(z) and −∈Sign(u). Otherwise, −∈Sign(x). The over-approximation for +∈Sign(x) is analogous.

*Reachability Under-Approximation* CanReachAnyIteration(L), used in Algorithm 4, needs to satisfy the property that if it returns true, then loop L reaches any iteration with positive probability. In AMBER, we implement this under-approximation as follows: CanReachAnyIteration(L)is true if there is a branch B of the loop guard polynomial G<sup>L</sup> such that <sup>B</sup>−GL<sup>i</sup> is non-negative for all <sup>i</sup>∈N. Otherwise, CanReachAnyIteration(L) is false. In other words, if CanReachAnyIteration(L) is true, then in any iteration there is a positive probability of G<sup>L</sup> not decreasing.

*Bound Computation Improvements* In addition to Algorithm 1 computing bounding functions for monomials of program variables, AMBER implements the following refinements:


Whenever the above enhancements are applicable, AMBER prefers them over Algorithm 1.

<sup>5</sup> https://github.com/diofant/diofant

### 6.2 Experimental Setting and Results

*Experimental Setting and Comparisons* Regarding programs which are PAST, we compare AMBER against the tool ABSYNTH [42] and the tool in [10] which we refer to as MGEN. ABSYNTH uses a system of inference rules over the syntax of probabilistic programs to derive bounds on the expected resource consumption of a program and can, therefore, be used to certify PAST. In comparison to AMBER, ABSYNTH requires the degree of the bound to be provided upfront. Moreover, ABSYNTH cannot refute the existence of a bound and therefore cannot handle programs that are not PAST. MGEN uses linear programming to synthesize linear martingales and supermartingales for probabilistic transition systems with linear variable updates. To certify PAST, we extended MGEN [10] with the SMT solver Z3 [41] in order to find or refute the existence of conical combinations of the (super)martingales derived by MGEN which yield RSMs.

With AMBER-LIGHT we refer to a variant of AMBER without the relaxations of the proof rules introduced in Section 4. That is, with AMBER-LIGHT the conditions of the proof rules need to hold for all <sup>i</sup>∈N, whereas with AMBER the conditions are allowed to only hold eventually. For all benchmarks, we compare AMBER against AMBER-LIGHT to show the effectiveness of the respective relaxations. For each experimental table (Tables 1- 3), ✓ symbolizes that the respective tool successfully certified PAST/AST/non-AST for the given program; ✗ means it failed to certify PAST/AST/non-AST. Further, **NA** indicates the respective tool failed to certify PAST/AST/non-AST because the given program is out-of-scope of the tool's capabilities. Every benchmark has been run on a machine with a 2.2 GHz Intel i7 (Gen 6) processor and 16 GB of RAM and finished within a timeout of 50 seconds, where most benchmarks terminated within a few seconds.

*Benchmarks* We evaluated AMBER against 38 probabilistic programs. We present our experimental results by separating our benchmarks within three categories: (i) 21 programs which are PAST (Table 1), (ii) 11 programs which are AST (Table 2) but not necessarily PAST, and (iii) 6 programs which are not AST (Table 3). The benchmarks have either been introduced in the literature on probabilistic programming [42,10,4,22,38], are adaptations of well-known stochastic processes or have been designed specifically to test unique features of AMBER, like the ability to handle polynomial real arithmetic.

The 21 PAST benchmarks consist of 10 programs representing the original benchmarks of MGEN [10] and ABSYNTH [42] augmented with 11 additional probabilistic programs. Not all benchmarks of MGEN and ABSYNTH could be used for our comparison as MGEN and ABSYNTH target related but different computation tasks than certifying PAST. Namely, MGEN aims to synthesize (super)martingales, but not ranking ones, whereas ABSYNTH focuses on computing bounds on the expected runtime. Therefore, we adopted *all* (50) benchmarks from [10] (11) and [42] (39) for which the termination behavior is non-trivial. A benchmark is trivial regarding PAST if either (i) there is no loop, (ii) the loop is bounded by a constant, or (iii) the program is meant to run forever. Moreover, we cleansed the benchmarks of programs for which the witness for PAST is just a trivial combination of witnesses for already included programs. For instance, the benchmarks of [42] contain multiple programs that are concatenated constant biased-random-walks. These are relevant benchmarks when evaluating ABSYNTH for discovering bounds, but would blur the picture when comparing against AMBER for PAST certification. With

Table 1: 21 programs which are PAST.

these criteria, 10 out of the 50 original benchmarks of [10] and [42] remain. We add 11 additional benchmarks which have either been introduced in the literature on probabilistic programming [4,22,38], are adaptations of well-known stochastic processes or have been designed specifically to test unique features of AMBER. Notably, out of the 50 original benchmarks from [42] and [10], only 2 remain which are included in our benchmarks and which AMBER cannot prove PAST (because they are not Prob-solvable). All our benchmarks are available at https://github.com/probing-lab/amber.

*Experiments with PAST – Table 1:* Out of the 21 PAST benchmarks, AMBER certifies 18 programs. AMBER cannot handle the benchmarks *nested\_loops* and *sequential\_loops*, as these examples use nested or sequential loops and thus are not expressible as Prob-solvable loops. The benchmarks *exponential\_past\_1* and *exponential\_past\_2* are out of scope of ABSYNTH because they require real numbers, while ABSYNTH can only handle integers. MGEN+Z3 cannot handle benchmarks containing non-linear variable updates or nonlinear guards. Table 1 shows that AMBER outperforms both ABSYNTH and MGEN+Z3 for Prob-solvable loops, even when our relaxed proof rules from Section 4 are not used. Yet, our experiments show that our relaxed proof rules enable AMBER to certify 6 examples to be PAST, which could not be proved without these relaxations by AMBER-LIGHT.

*Experiments with AST – Table 2:* We compare AMBER against AMBER-LIGHT on 11 benchmarks which are AST but not necessarily PAST and also cannot be split into PAST subprograms. Therefore, the SM-Rule is needed to certify AST. To the best of our knowledge, AMBER is the first tool able to certify AST for such programs. Existing approaches like [1] and [14] can only witness AST for non-PAST programs, if - intuitively speaking - the programs contain subprograms which are PAST. Therefore, we compared

AMBER only against AMBER-LIGHT on this set of examples. The benchmark *symmetric\_2d\_random\_walk*, which AMBER fails to certify as AST, models the symmetric random walk inR<sup>2</sup> and is still out of reach of current automation techniques. In [38] the authors mention that a closed-form expressionM and functions p and d satisfying the conditions of the SM-Rule have not been discovered yet. The benchmark *fair\_in\_limit\_random\_walk* involves non-constant probabilities and can therefore not be modeled as a Prob-solvable loop.

*Experiments with non-AST – Table 3:* We compare AMBER against AMBER-LIGHT on 6 benchmarks which are not AST. To the best of our knowledge, AMBER is the first tool able to certify non-AST for such programs, and thus we compared AMBER only against AMBER-LIGHT. In [13], where the notion of repulsing supermartingales and the R-AST-Rule are introduced, the authors also propose automation techniques. However, the authors of [13] claim that their "experimental results are basic" and their computational methods are evaluated on only 3 examples, without having any available tool support. For the benchmarks in Table 3, the outcomes of AMBER and AMBER-LIGHT coincide. The reason for this is R-AST-Rule's condition that the martingale expression has to have c-bounded differences. This condition forces a suitable martingale expression to be bounded by a linear function, which is also the reason why AMBER cannot certify the benchmark *polynomial\_nast*.

*Experimental Summary* Our results from Tables 1-3 demonstrate that:


# 7 Related Work

*Proof Rules for Probabilistic Termination* Several proof rules have been proposed in the literature to provide sufficient conditions for the termination behavior of probabilistic programs. The work of [10] uses martingale theory to characterize *positive almost sure termination (PAST)*. In particular, the notion of a ranking supermartingale (RSM) is introduced together with a proof rule (RSM-Rule) to certify PAST, as discussed in Section 3.1. The approach of [19] extended this method to include (demonic) non-determinism and continuous probability distributions, showing the completeness of the RSM-Rule for this program class. The compositional approach proposed in [19] was further strengthened in [29] to a sound approach using the notion of *descent supermartingale map*. In [1], the authors introduced *lexicographic* RSMs.

The SM-Rule discussed in Section 3.2 was introduced in [38]. It is worth mentioning that this proof rule is also applicable to non-deterministic probabilistic programs. The work of [28] presented an independent proof rule based on supermartingales with lower bounds on conditional absolute differences. Both proof rules are based on supermartingales and


Table 2: 11 programs which are AST and not necessarily PAST.

Table 3: 6 programs which are not AST.

can certify AST for programs that are not necessarily PAST. The approach of [43] examined martingale-based techniques for obtaining bounds on reachability probabilities— and thus termination probabilities— from an order-theoretic viewpoint. The notions of *nonnegative repulsing supermartingales* and γ*-scaled submartingales*, accompanied by sound and complete proof rules, have also been introduced. The R-AST-Rule from Section 3.3 was proposed in [13] mainly for obtaining bounds on the probability of stochastic invariants.

An alternative approach is to exploit weakest precondition techniques for probabilistic programs, as presented in the seminal works [34,35] that can be used to certify AST. The work of [37] extended this approach to programs with non-determinism and provided several proof rules for termination. These techniques are purely syntax-based. In [31] a weakest precondition calculus for obtaining bounds on expected termination times was proposed. This calculus comes with proof rules to reason about loops.

*Automation of Martingale Techniques* The work of [10] proposed an automated procedure — by using Farkas' lemma — to synthesize *linear* (super)martingales for probabilistic programs with linear variable updates. This technique was considered in our experimental evaluation, cf. Section 6. The algorithmic construction of supermartingales was extended to treat (demonic) non-determinism in [12] and to polynomial supermartingales in [11] using semi-definite programming. The recent work of [14] uses ω-regular decomposition to certify AST. They exploit so-called *localized* ranking supermartingales, which can be synthesized efficiently but must be linear.

*Other Approaches* Abstract interpretation is used in [39] to prove the probabilistic termination of programs for which the probability of taking a loop k times decreases at least exponentially with k. In [18], a sound and complete procedure deciding AST is given for probabilistic programs with a finite number of reachable states from any initial state. The work of [42] gave an algorithmic approach based on potential functions for computing bounds on the expected resource consumption of probabilistic programs. In [36], model checking is exploited to automatically verify whether a parameterized family of probabilistic concurrent systems is AST.

Finally, the class of Prob-solvable loops considered in this paper extends [4] to a wider class of loops. While [4] focused on computing statistical higher-order moments, our work addresses the termination behavior of probabilistic programs. The related approach of [22] computes exact expected runtimes of constant probability programs and provides a decision procedure for AST and PAST for such programs. Our programming model strictly generalizes the constant probability programs of [22], by supporting polynomial loop guards, updates and martingale expressions.

# 8 Conclusion

This paper reported on the automation of termination analysis of probabilistic whileprograms whose guards and expressions are polynomial expressions. To this end, we introduced mild relaxations of existing proof rules for AST, PAST, and their negations, by requiring their sufficient conditions to hold only eventually. The key to our approach is that the structural constraints of Prob-solvable loops allow for automatically computing almost sure asymptotic bounds on polynomials over program variables. Prob-solvable loops cover a vast set of complex and relevant probabilistic processes including random walks and dynamic Bayesian networks [5]. Only two out of 50 benchmarks in [10,42] are outside the scope of Prob-solvable loops regarding PAST certification. The almost sure asymptotic bounds were used to formalize algorithmic approaches for proving AST, PAST, and their negations. Moreover, for Prob-solvable loops four different proof rules from the literature uniformly come together in our work.

Our approach is implemented in the software tool AMBER (github.com/probinglab/amber), offering a fully automated approach to probabilistic termination. Our experimental results show that our relaxed proof rules enable proving probabilistic (non-) termination of more programs than could be treated before. A comparison to the state-ofart in automated analysis of probabilistic termination reveals that AMBER significantly outperforms related approaches. To the best of our knowledge, AMBER is the first tool to automate AST, PAST, non-AST and non-PAST in a single tool-chain.

There are several directions for future work. These include extensions to Prob-solvable loops such as symbolic distributions, more complex control flow, and non-determinism. We will also consider program transformations that translate programs into our format. Extensions of the SM-Rule algorithm with non-constant probability and decrease functions are also in our interest.

# References


### 518 M. Moosbrugger et al.

44. Yamada, A., Kusakari, K., Sakabe, T.: Nagoya termination tool. In: Proc. of RTA-TLCA (2014). https://doi.org/10.1007/978-3-319-08918-8\_32

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Bayesian strategies: probabilistic programs as generalised graphical models**

Hugo Paquet

Department of Computer Science, University of Oxford, Oxford, UK hugo.paquet@cs.ox.ac.uk

**Abstract.** We introduce Bayesian strategies, a new interpretation of probabilistic programs in game semantics. This interpretation can be seen as a refinement of Bayesian networks.

Bayesian strategies are based on a new form of event structure, with two causal dependency relations respectively modelling control flow and data flow. This gives a graphical representation for probabilistic programs which resembles the concrete representations used in modern implementations of probabilistic programming.

From a theoretical viewpoint, Bayesian strategies provide a rich setting for denotational semantics. To demonstrate this we give a model for a general higher-order programming language with recursion, conditional statements, and primitives for sampling from continuous distributions and trace re-weighting. This is significant because Bayesian networks do not easily support higher-order functions or conditionals.

# **1 Introduction**

One promise of probabilistic programming languages (PPLs) is to make Bayesian statistics accessible to anyone with a programming background. In a PPL, the programmer can express complex statistical models clearly and precisely, and they additionally gain access to the set of inference tools provided by the probabilistic programming system, which they can use for simulation, data analysis, etc. Such tools are usually designed so that the user does not require any in-depth knowledge of Bayesian inference algorithms.

A challenge for language designers is to provide efficient inference algorithms. This can be intricate, because programs can be arbitrarily complex, and inference requires a close interaction between the inference engine and the language interpreter [42, Ch.6]. In practice, many modern inference engines do not manipulate the program syntax direcly but instead exploit some representation of it, more suited to the type of inference method at hand (Metropolis-Hastings (MH), Sequential Monte Carlo (SMC), Hamiltonian Monte Carlo, variational inference, etc.).

While many authors have recently given proofs of correctness for inference algorithms (see for example [11,24,32]), most have focused on idealised descriptions of the algorithms, based on syntax or operational semantics, rather than on the concrete program representations used in practice. In this paper we instead put forward a mathematical semantics for probabilistic programs designed to provide reasoning tools for existing implementations of inference.

Our work targets a specific class of representations which we call data flow representations. We understand **data flow** as describing the dependence relationships between random variables of a program. This is in contrast with **control flow**, which describes in what order samples are performed. Such data flow representations are widely used in practice. We give a few examples. For Metropolis-Hastings inference, Church [30] and Venture [41] manipulate dependency graphs for random variables ("computation traces" or "probabilistic execution traces"); Infer.NET [22] compiles programs to factor graphs in order to apply message passing algorithms; for a subset of well-behaved programs, Gen [23] statically constructs a representation based on certain combinators which is then exploited by a number of inference algorithms; and finally, for variational inference, Pyro [9] and Edward [55] rely on data flow graphs for efficient computation of gradients by automatic differentiation. (Also [52,28].)

In this paper, we make a step towards correctness of these implementations and introduce **Bayesian strategies**, a new representation based on Winskel's event structures [46] which tracks both data flow and control flow. The Bayesian strategy corresponding to a program is obtained compositionally as is standard in concurrent game semantics [63], and provides an intensional foundation for probabilistic programs, complementary to existing approaches [24,57].

This paper was inspired by the pioneering work of Scibior et al. [ ´ 53], which provides the first denotational analysis for concrete inference representations. In particular, their work provides a general framework for proving correct inference algorithms based on static representations. But the authors do not show how their framework can be used to accommodate data flow representations or verify any of the concrete implementations mentioned above. The work of this paper does not fill this gap, as we make no attempt to connect our semantic constructions with those of [53], or indeed to prove correct any inference algorithms. This could be difficult, because our presentation arises out of previous work on game semantics and thus does not immediately fit in with the monadic techniques employed in [53]. Nonetheless, efforts to construct game semantics monadically are underway [14], and it is hoped that the results presented here will set the ground for the development of event structure-based validation of inference.

### **1.1 From Bayesian networks to Bayesian strategies**

Consider the following basic model, found in the Pyro tutorials (and also used in [39]), used to infer the weight of an object based on two noisy measurements. The measurements are represented by random variables meas<sup>1</sup> and meas2, whose values are drawn from a normal distribution around the true weight (weight), whose prior distribution is also normal, and centered at 2. (In this situation, meas<sup>1</sup> and meas<sup>2</sup> are destined to be conditioned on actual observed values, and the problem is then to infer the posterior distribution of weight based on these observations. We leave out conditioning in this example and focus on the model specification.)

To describe this model it is convenient to use a Bayesian network, i.e. a DAG of random variables in which the distribution of each variable depends only on the value of its parents:

The same probabilistic model can be encoded in an ML-style language:

```
let weight = sampleweight normal(2, 1) in
samplemeas1 normal(weight, 0.1);
samplemeas2 normal(weight, 0.1);
()
```
Our choice of sampling meas<sup>1</sup> before meas<sup>2</sup> is arbitrary: the same program with the second and third lines swapped corresponds to the same probabilistic model. This redundancy is unavoidable because programs are inherently sequential. It is the purpose of "commutative" semantics for probabilistic programs, as introduced by Staton et al. [54,57], to clarify this situation. They show that reordering program lines does not change the semantics, even in the presence of conditioning. This result says that when specifying a probabilistic model, only data flow matters, and not control flow. This motivates the use of program representations based on data flow such as the examples listed above.

In our game semantics, a probabilistic program is interpreted as a control flow graph annotated by a data dependency relation. The Bayesian strategy associated with the program above is as follows:

where (in brief), is data flow, is control flow, and the dashed node is the program output. (Probability distributions are as in the Bayesian network.)

The semantics is not commutative, simply because reordering lines affects control flow; we emphasise that the point of this work is not to prove any new program equations, but instead to provide a formal framework for the representations involved in practical inference settings.

### **1.2 Our approach**

To formalise this idea we use event structures, which naturally model control flow, enriched with additional structure for probability and an explicit data flow relation. Event structures were used in previous work by the author and Castellan on probabilistic programming [18], and were shown to be a good fit for reasoning about MH inference. But the representation in [18] combines data flow and control flow in a single transitive relation, and thus suffers from important limitations. The present paper is a significant improvement: by maintaining a clear separation between control flow and data flow, we can reframe the ideas in the well-established area of concurrent game semantics [63], which enables an interpretation of recursion and higher-order functions; these were not considered in [18]. Additionally, here we account for the fact that data flow in probabilistic programming is not in general a transitive relation.

While there is some work in setting up the right notion of event structure, the standard methods of concurrent game semantics adapt well to this setting. This is not surprising, as event structures and games are known to be resistant to the addition of extra structure, see e.g. [21,5,15]. One difficulty is to correctly define composition, keeping track of potential hidden data dependencies. In summary:


Paper outline. We start by recalling the basics of probability and Bayesian networks, and we then describe the syntax of our language (Sec. 2). In Sec. 3, we introduce event structures and Bayesian event structures, and informally describe our semantics using examples. In Sec. 4 we define our category of arenas and strategies, which we apply to the denotational semantics of the language in Sec. 5. We give some context and perspectives in Sec. 6.

Acknowledgements. I am grateful to Simon Castellan, Mathieu Huot and Philip Saville for helpful comments on early versions of this paper. This work was supported by grants from EPSRC and the Royal Society.

# **2 Probability distributions, Bayesian networks, and probabilistic programming**

# **2.1 Probability and measure**

We recall the basic notions, see e.g. [8] for a reference.

Measures. A **measurable space** is a set X equipped with a σ**-algebra**, that is, a set Σ<sup>X</sup> of subsets of X containing X itself, and closed under completements and countable unions. The elements of Σ<sup>X</sup> are called **measurable subsets** of X. An important example of measurable space is the set R equipped with its σ-algebra ΣR of Borel sets, the smallest one containing all intervals. Another basic example is the discrete space N, in which all subsets are measurable.

A **measure** on (X, ΣX) is a function μ : Σ<sup>X</sup> → [0, ∞] which is countably additive, i.e. μ( ' <sup>i</sup>∈<sup>I</sup> <sup>U</sup>i) = <sup>i</sup>∈<sup>I</sup> <sup>U</sup><sup>i</sup> for <sup>I</sup> countable, and satisfies <sup>μ</sup>(∅) = 0. A fundamental example is the **Lebesgue measure** λ on R, defined on intervals as λ([a, b]) = b − a and extended to all Borel sets. Another example (for arbitrary X) is the **Dirac measure** at a point x ∈ X: for any U ∈ ΣX, δx(U) = 1 if x ∈ U, 0 otherwise. A **sub-probability measure** on (X, ΣX) is a measure μ satisfying μ(X) ≤ 1.

A function <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>Y</sup> is measurable if <sup>U</sup> <sup>∈</sup> <sup>Σ</sup><sup>Y</sup> <sup>=</sup><sup>⇒</sup> <sup>f</sup> <sup>−</sup><sup>1</sup><sup>U</sup> <sup>∈</sup> <sup>Σ</sup>X. Given a measure on a space <sup>X</sup> and a measurable function <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>R</sup>, for every measurable subset U of X we can define the **integral** + <sup>U</sup> dμf, an element of <sup>R</sup>∪{∞}. This construction yields a measure on <sup>X</sup>. (Many well-known probability distributions on the reals arise in this way from their density.)

Kernels. We will make extensive use of kernels, which can be seen as parametrised families of measures. Formally a **kernel** from X to Y is a map k : X × Σ<sup>Y</sup> → [0, ∞] such that for every x ∈ X, k(x, −) is a measure on Y , and for every V ∈ Σ<sup>Y</sup> , k(−, V ) is a measurable function. It is a **sub-probability kernel** if each k(x, −) is a sub-probability measure, and it is an **s-finite kernel** if it is a countable (pointwise) sum of sub-probability kernels. Every measurable function <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>Y</sup> induces a Dirac kernel <sup>δ</sup><sup>f</sup> : <sup>X</sup> Y : x → δ<sup>f</sup>(x). Kernels compose: if k : X Y and h : Y Z then the map h ◦ k : X × Σ<sup>Z</sup> → [0, 1] defined as (x, W) → <sup>+</sup> <sup>Y</sup> dk(x, −)h(−, W) is also a kernel, and the Dirac kernel δid (often just δ) is an identity for this composition. We note that if both h and k are sub-probability kernels, then h ◦ k is a sub-probability kernel. Finally, observe that a kernel **1** X, for **1** a singleton space, is the same thing as a measure on X.

In this paper we will refer to the bernoulli, normal, and uniform families of distributions; all of these are sub-probability kernels from their parameters spaces to N or R. For example, there is a kernel R<sup>2</sup> <sup>R</sup> : ((x, y), U) → μN(x,y)(U), where μN(x,y) is the measure associated with a normal distribution with parameters (x, y), if y > 0, and the 0 measure otherwise. We understand the bernoulli distribution as returning either 0 or 1 <sup>∈</sup> <sup>N</sup>.

Product spaces and independence. When several random quantities are under study one uses the notion of **product space**: given (X, ΣX) and (Y,Σ<sup>Y</sup> ) we can equip the set X × Y with the product σ-algebra, written Σ<sup>X</sup>×<sup>Y</sup> , defined as the smallest one containing U × V , for U ∈ Σ<sup>X</sup> and V ∈ Σ<sup>Y</sup> .

A measure μ on X × Y gives rise to **marginals** μ<sup>X</sup> and μ<sup>Y</sup> , measures on X and Y respectively, defined by μX(U) = μ(U × Y ) and μ<sup>Y</sup> (V ) = μ(X × V ) for U ∈ Σ<sup>X</sup> and V ∈ Σ<sup>Y</sup> .

Given kernels k : X Y and h : Z W we define the **product kernel** <sup>k</sup> <sup>×</sup> <sup>h</sup> : <sup>X</sup> <sup>×</sup> <sup>Z</sup> Y × W via iterated integration:

$$((x,z),U)\mapsto\int\_{y\in Y} \mathrm{d}k(x,-)\int\_{w\in W} \mathrm{d}h(z,-)\chi\_U(y,w),$$

where χ<sup>U</sup> is the characteristic function of U ∈ Σ<sup>Y</sup> <sup>×</sup><sup>V</sup> . When X = Z = **1** this gives the notion of **product measure**.

The definitions above extend with no difficulty to product spaces / <sup>i</sup>∈<sup>I</sup> <sup>X</sup>i. A measure P on / <sup>i</sup>∈<sup>I</sup> <sup>X</sup><sup>i</sup> has marginals <sup>P</sup><sup>J</sup> for any <sup>J</sup> <sup>⊆</sup> <sup>I</sup>, and we say that <sup>X</sup><sup>i</sup> and X<sup>j</sup> are **independent w.r.t. P** if the marginal Pi,j is equal to the product measure P<sup>i</sup> × P<sup>j</sup> .

### **2.2 Bayesian networks**

An efficient way to define measures on product spaces is using probabilistic graphical models [37], for example Bayesian networks, whose definition we briefly recall now. The idea is to use a graph structure to encode a set of independence constraints between the components of a product space. We recall the definition of conditional independence. With respect to a joint distribution P on / <sup>i</sup>∈<sup>I</sup> <sup>X</sup>i, we say X<sup>i</sup> and X<sup>j</sup> are **conditionally independent given** X<sup>k</sup> if there exists a kernel k : X<sup>k</sup> <sup>X</sup><sup>i</sup> <sup>×</sup> <sup>X</sup><sup>j</sup> such that <sup>P</sup>i,j,k(U<sup>i</sup> <sup>×</sup> <sup>U</sup><sup>j</sup> <sup>×</sup> <sup>U</sup>k) = <sup>+</sup> <sup>U</sup><sup>k</sup> k(−, U<sup>i</sup> × U<sup>j</sup> )dP<sup>k</sup> for all measurable Ui, U<sup>j</sup> , Uk, and X<sup>i</sup> and X<sup>j</sup> are independent w.r.t. k(xk, −) for all x<sup>k</sup> ∈ Xk. In this definition, k is a conditional distribution of X<sup>i</sup> × X<sup>j</sup> given X<sup>k</sup> (w.r.t. P); under some reasonable conditions [8] this always exists, and the independence condition is the main requirement.

Adapting the presentation used in [27], we define a **Bayesian network** as a directed acyclic graph <sup>G</sup> = (V, ) where each node <sup>v</sup> <sup>∈</sup> <sup>V</sup> is assigned a measurable space M(v). We define the **parents** pa(v) of v to be the set of nodes u with u v, and its **non-descendants** nd(v) to contain the nodes u such that there is no path <sup>v</sup> ··· <sup>u</sup>. Writing <sup>M</sup>(S) = / <sup>v</sup>∈<sup>S</sup> <sup>M</sup>(v) for any subset S ⊆ V , a measure P on M(V ) is said to be **compatible with** G if for every v ∈ V , M(v) and M(nd(v)) are independent given M(pa(v)). It is straightforward to verify that given a Bayesian network G, we can construct a compatible measure by supplying for every v ∈ V , an s-finite kernel k<sup>v</sup> : <sup>M</sup>(pa(v)) M(v).

(In practice, Bayesian networks are used to represent probabilistic models, and so typically every kernel k<sup>v</sup> is strictly probabilistic. Here the k<sup>v</sup> are only required to be s-finite, so they are in general unnormalised. As we will see, this is because we consider possibly conditioned models.)

Bayesian networks are an elegant way of constructing models, but they are limited. We now present a programming language whose expressivity goes beyond them.

### **2.3 A language for probabilistic modelling**

Our language of study is a call-by-value statistical language with sums, products, and higher-order types, as well as recursive functions. Languages with comparable features are considered in [11,57,40].

The syntax of this language is described in Fig. 1. Note the distinction between general terms M,N and values V . The language includes the usual term constructors and pattern matching. Base types are the unit type, the real numbers and the natural numbers, and for each of them there are associated constants. The language is parametrised by a set L of labels, a set F of partial measurable functions <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> or <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>N</sup>, and a set <sup>D</sup> of standard distribution families, which are sub-probability kernels<sup>1</sup> R<sup>n</sup> R or R<sup>n</sup> N. There is also a primitive **score** which multiplies the weight of the current trace by the value of its argument. This is an idealised form of conditioning via soft constraints, which justifies the move from sub-probability to s-finite kernels (see [54]).

$$\begin{aligned} &A, B ::= \mathbf{1} \mid \mathbb{N} \mid \mathbb{R} \mid A \times B \mid A + B \mid A \to B \\ &V, W ::= \left( \right) \mid \underline{n} \mid \underline{r} \mid \underline{f} \mid (V, W) \mid \mathbf{in} \mathbf{1} \mid V \mid \mathbf{in} \mathbf{r} \mid V \mid \lambda x. M \\ &M, N ::= V \mid x \mid M \, N \mid M = \stackrel{?}{=} 0 \mid \mu x : A \to B. M \mid \mathbf{sample}\_{\ell} \, \mathsf{dist}(M\_{1}, \ldots, M\_{N}) \\ &(M, N) \mid \mathbf{match} \, M \, \mathbf{with} \, (x, y) \to P \mid \mathbf{score } M \\ &\mathbf{in} \, M \mid \mathbf{in} \mathbf{r} \, M \mid \mathbf{match} \, M \, \mathbf{with} \, \left[ \mathbf{in} \, x \to N\_{1} \mid \mathbf{in} \mathbf{r} \, x \to N\_{2} \right] \end{aligned}$$

Fig. 1: Syntax.

$$\begin{array}{c} \begin{array}{c} r \in \mathbb{R} \\ \hline \Gamma \vdash \underline{r} : \mathbb{R} \end{array} \quad \begin{array}{c} \begin{array}{c} \Gamma \vdash M : \mathbb{N} \\ \Gamma \vdash M \stackrel{\scriptstyle \scriptstyle \vdash}{\longrightarrow} 0 : \mathbb{B} \end{array} \quad \begin{array}{c} \Gamma \vdash M : \mathbb{R} \\ \Gamma \vdash \texttt{score } M : \textbf{1} \end{array} \quad \begin{array}{c} \Gamma, x : A \to B \vdash M : A \to B \\ \Gamma \vdash \mu x : A \to B. M : A \to B \end{array} \end{array}$$

$$\begin{array}{c} \begin{array}{c} (f : \mathbb{R}^{n} \to \mathbb{X}) \in \mathcal{F} \\ \Gamma \vdash \underline{f} : \mathbb{R}^{n} \to \mathbb{X} \end{array} \quad \begin{array}{c} (\mathsf{dist} : \mathbb{R}^{n} \to \mathbb{X}) \in \mathcal{D} \quad \text{For } i = 1, \ldots, n, \ \Gamma \vdash M\_{i} : \mathbb{R} \\ \Gamma \vdash \texttt{sample}\_{\ell} \, \mathit{dist}(M\_{1}, \ldots, M\_{n}) : \mathbb{X} \end{array} \end{array}$$

Fig. 2: Subset of typing rules.

Terms of the language are typed in the standard way; in Fig. 2 we present a subset of the rules which could be considered non-standard. We use X to stand for either N or R, and we do not distinguish between the type and the corresponding measurable space. We also write B for **1** + **1**, and use syntactic sugar for let-bindings, sequencing, and conditionals:

$$\begin{aligned} \text{let } x:A &= M \text{ in } N \ &:= \ (\lambda x:A.N) \ M\\ M;N &:= \ \text{let } x:A = M \text{ in } N \ &(\text{for } x \text{ not free in } N)\\ \text{if } M \ \text{then } N\_1 \ \text{else } N\_2 &:= \ \text{match } M \ \text{with } [\text{in} \, x \to N\_1 \ | \ \text{in} \, x \to N\_2] \end{aligned}$$

# **3 Programs as event structures**

In this section, we introduce our causal approach. We give a series of examples illustrating how programs can be understood as graph-like structures known as event structures, of which we assume no prior knowledge. Event structures were introduced by Winskel et al. [46], though for the purposes of this work the traditional notion must be significantly enriched.

<sup>1</sup> In any practical instance of the language it would be expected that every kernel in D has a density in F, but this is not strictly necessary here.

Fig. 3

The examples which follow are designed to showcase the following features of the semantics: combination of data flow and control flow with probability (Sec. 3.1), conditional branching (Sec. 3.2), open programs with multiple arguments (Sec. 3.3) and finally higher-order programs (Sec. 3.4). We will then give further definitions in Sec. 3.5 and Sec. 3.6.

Our presentation in Sec. 3.1 and Sec. 3.2 is intended to be informal; we give all the necessary definitions starting from Sec. 3.3.

### **3.1 Control flow, data flow, and probability**

We briefly recall the example of the introduction; the program and its semantics are given in Fig. 3. As before, represents control flow, and represents data flow. There is a node for each random choice in the program, and the dependency relationships are pictured using the appropriate arrows. Naturally, a data dependency imposes constraints on the control flow: every arrow must be realised by a control flow path <sup>∗</sup>. There is an additional node for the output value, drawn in a dashed box, which indicates that it is a possible point of interaction with other programs. This will be discussed in Sec. 3.3.

Although this is not pictured in the above diagram, the semantics also comprises a family of kernels, modelling the probabilistic execution according to the distributions specified by the program. Intuitively, each node has a distribution whose parameters are its parents for the relation . For example, the node labelled meas<sup>2</sup> will be assigned a kernel kmeas<sup>2</sup> : R R defined so that kmeas<sup>2</sup> (weight, −) is a normal distribution with parameters (weight, 0.1).

### **3.2 Branching**

Consider a modified scenario in which only one measurement is performed, but with probability 0.01 an error occurs and the scales display a random number between 0 and 10. The corresponding program and its semantics are given in Fig. 4.

In order to represent the conditional statement we have introduced a new element to the graph: a binary relation known as conflict, pictured , and indicating that two nodes are incompatible and any execution of the program will only encounter one of them. Conflict is hereditary, in the sense that the respective futures of two nodes in conflict are also incompatible. Hence we need two copies of (); one for each branch of the conditional statement. Unsurprisingly,

beyond the branching point all events depend on error, since their very existence depends on its value.

We continue our informal presentation with a description of the semantics of open terms. This will provide enough context to formally define the notion of event structure we use in this paper, which differs from others found in the literature.

### **3.3 Programs with free variables**

We turn the example in Sec. 3.2 into one involving two free variables, guess and rate, used as parameters for the distributions of weight and error, respectively. These allow the same program to serve as a model for different situations. Formally we have a term <sup>M</sup> such that guess : <sup>R</sup>, rate : <sup>R</sup> <sup>M</sup> : **<sup>1</sup>**, given in Fig. <sup>5</sup> with its semantics. We see that the two parameters are themselves represented

by nodes, drawn in dotted boxes, showing that (like the output nodes) they are a point of interaction with the program's external environment; this time, a value is received rather than sent. Below, we will distinguish between the different types of nodes by means of a polarity function.

We attach to the parameter nodes the appropriate data dependency arrows. The subtlety here is with control flow: while is it clear that parameter values must be obtained before the start of the execution, and that necessarily guess weight and rate weight, it is less clear what relationship guess and rate should have with each other.

In a call-by-value language, we find that leaving program arguments causally independent (of each other) leads to soundness issues. But it would be equally unsound to impose a causal order between them. Therefore, we introduce a form of synchronisation relation, amounting to having both guess rate and rate guess, but we write guess rate instead. In event structure terminology this is known as a coincidence, and was introduced by [19] to study the synchronous π-calculus. Note that in many approaches to call-by-value games (e.g. [31,26]) one would bundle both parameters into a single node representing the pair (guess, rate), but this is not suitable here since our data flow analysis requires separate nodes.

We proceed to define event structures, combining the ingredients we have described so far: control dependency, data dependency, conflict, and coincidence, together with a polarity function, used implicitly above to distinguish between input nodes (−), output nodes (+), and internal random choices (0).

**Definition 1.** An *event structure* E is a set E of events (or nodes) together with the following structure:


Often we write <sup>E</sup> instead of the whole tuple (E, <sup>≤</sup>, #, , pol). It is sometimes useful to quotient out coincidences: we write E for the **set of -equivalence classes**, which we denote as boldface letters (**e**, **a**, **s**,...). It is easy to check that this is also an event structure with **<sup>e</sup>** <sup>≤</sup> **<sup>e</sup>** (resp. #, ) if there is <sup>e</sup> <sup>∈</sup> **<sup>e</sup>** and <sup>e</sup> <sup>∈</sup> **<sup>e</sup>** with <sup>e</sup> <sup>≤</sup> <sup>e</sup> (resp. #, ), and evident polarity function.

We will see in Sec. 3.5 how this structure can be equipped with quantitative information (in the form of measurable spaces and kernels). Before discussing higher-order programs, we introduce the fundamental concept of configuration, which will play an essential role in the technical development of this paper.

**Definition 2.** A *configuration* of E is a finite subset x ⊆ E which is downclosed (if e ≤ e and e ∈ x then e ∈ x) and conflict-free (if e, e ∈ x then ¬(e # e )). The *set of all configurations* of E is denoted C (E) and it is a partial order under ⊆.

We introduce some important terminology. For an event e ∈ E, we have defined its **history** [e] above. This is always a configuration of E, and the smallest one containing e. More generally we can define [**e**] = {e | ∀e ∈ **e**. e ≤ e}, and [**e**)=[**e**] \ **e**.

The **covering relation** −⊂ defines the smallest non-trivial extensions to a configuration; it is defined as follows: x−⊂y if there is **e** ∈ E such that x∩**e** = ∅ and <sup>y</sup> <sup>=</sup> <sup>x</sup> <sup>∪</sup> **<sup>e</sup>**. We will sometimes write <sup>x</sup> −⊂**<sup>e</sup>** <sup>y</sup>. We sometimes annotate −⊂ and ⊆ with the polarities of the added events: so for instance x ⊆+,<sup>0</sup> y if each e<sup>i</sup> ∈ y \ x has polarity + or 0.

### **3.4 Higher-order programs**

We return to a fairly informal presentation; our goal now is to convey intuition about the representation of higher-order programs in the framework of event structures. We will see in Sec. 4 how this representation is obtained from the usual categorical approach to denotational semantics.

Consider yet another faulty-scales scenario, in which the probability of error now depends on the object's weight. Suppose that this dependency is not known by the program, and thus left as a parameter rate : <sup>R</sup> <sup>→</sup> <sup>R</sup>. The resulting program has type rate : <sup>R</sup> <sup>→</sup> <sup>R</sup>, guess : <sup>R</sup> <sup>R</sup>, as follows:

$$\begin{aligned} \text{let weight} &= \text{sample}\_{weight} \,\,\text{normal}(guess, 1) \,\,\text{in} \\ \text{let error} &= \text{sample}\_{error} \,\,\text{bernoulli} \,\,(rate \,\,\text{weight}) \,\,\text{in} \,\,\text{error} \end{aligned}$$

We give its semantics in Fig. 6. (To keep things simple this scenario involves no measurements.)

It is an important feature of the semantics presented here that higherorder programs are interpreted as causal structures involving only values of ground type. In the example, the argument rate is initially received not as a mathematical function, but as a single message of unit type (labelled λrate), which gives the program the possibility to call the function rate by feeding it an input value. Because the behaviour of rate is unknown, its output is treated as a new argument to the program, represented by the negative out node. The shaded region

highlights the part of computation during which the program interacts with its argument rate. The semantics accommodates the possibiliy that rate itself has internal random choices; this will be accounted for in the compositional framework of Sec. 4.

# **3.5 Bayesian event structures**

We show now that event structures admit a probabilistic enrichment.<sup>2</sup>

**Definition 3.** A *measurable event structure* is an event structure together with the assignment of a measurable space M(e) for every event e ∈ E. For any X ⊆ E we set M(X) = / <sup>e</sup>∈<sup>X</sup> <sup>M</sup>(e).

As is common in statistics, we often call e (or X) an element of M(e) (or M(X)). We now proceed to equip this with a kernel for each event.

**Definition 4.** For E an event structure and e ∈ E, we define the *parents* pa(e) of <sup>e</sup> as {<sup>d</sup> <sup>∈</sup> <sup>E</sup> <sup>|</sup> <sup>d</sup> <sup>e</sup>}.

**Definition 5.** A *quantitative event structure* is a measurable event structure <sup>E</sup> with, for every non-negative <sup>e</sup> <sup>∈</sup> <sup>E</sup>, a kernel <sup>k</sup><sup>e</sup> : <sup>M</sup>(pa(e)) M(e).

Our Bayesian event structures are quantitative event structures satisfying an additional axiom, which we introduce next. This axiom is necessary for a smooth combination of data flow and control flow; without it, the compositional framework of the next section is not possible.

**Definition 6.** Let E be a quantitative event structure. We say that e ∈ E is *non-uniform* if there are distinct pa(e), pa (e) ∈ M(pa(e)) such that

$$k\_e(\underline{\text{pa}}(e), \mathcal{M}(e)) \neq k\_e(\underline{\text{pa}}'(e), \mathcal{M}(e)).$$

We finally define:

**Definition 7.** A *Bayesian event structure* is a quantitative event structure such that if e ∈ E is non-uniform, and e ≤ e with e and e not coincident, then pa(e) ⊆ pa(e ).

The purpose of this condition is to ensure that Bayesian event structures support a well-behaved notion of "hiding", which we will define in the next section.

# **3.6 Symmetry**

For higher-order programs, event structures in the sense of Definition 1 present a limitation. This has to do with the possibility for a program to call a function argument more than once, which the compositional framework of Sec. 4 does not readily support. We will use a linear logic-inspired "!" to duplicate nodes, thus making certain configurations available in infinitely many copies. The following additional structure, called symmetry, is there to enforce that these configurations yield equivalent behaviour.

<sup>2</sup> We emphasise that our notion of "event" is not related to the usual notion of event in probability theory.

**Definition 8 (Winskel [61]).** A *symmetry* on an event structure E is a family ∼=<sup>E</sup> of bijections θ : x ∼= y, with x, y ∈ C (E), containing all identity bijections and closed under composition and inverses, satisfying the following axioms.


We write θ : x ∼=<sup>E</sup> y if (θ : x ∼= y) ∈ ∼=E. When E is Bayesian, we additionally require k<sup>e</sup> = k<sup>θ</sup>(e) for every non-negative e ∈ x. (This is well-defined because θ preserves data flow and thus pa(θ(**e**)) = θ pa(**e**).)

Although symmetry can be mathematically subtle, combining it with additional data on event structures does not usually pose any difficulty [15,48].

In this section we have described Bayesian event structures with symmetry, which are the basic mathematical objects we use to represent programs. A central contribution of this paper is to define a compositional semantics, in which the interpretation of a program is obtained from that of its sub-programs. This is the topic of the next section.

# **4 Games and Bayesian strategies**

The presentation is based on game semantics, a line of research in the semantics of programming languages initiated in [3,33], though the subject has earlier roots in the semantics of linear logic proofs (e.g. [10]).

It is typical of game semantics that programs are interpreted as concrete computational trees, and that higher-order terms are described in terms of the possible interactions with their arguments. As we have seen in the examples of the previous section, this interaction takes the form of an exchange of firstorder values. The central technical achievement of game semantics is to provide a method for composing such representations.

To the reader not familiar with game semantics, the terminology may be misleading: the work of this paper hardly retains any connection to game theory. In particular there is no notion of winning. The analogy may be understood as follows for a given program of type Γ M : A. There are two players: the program itself, and its environment. The "game", which we study from the point of view of the program, takes place in the arena -<sup>Γ</sup> <sup>A</sup>, which specifies which moves are allowed (calls to arguments in Γ, internal samples, return values in <sup>A</sup>, etc.). The semantics of <sup>M</sup> is a strategy (written -<sup>M</sup>), which specifies a plan of action for the program to follow in reaction to the moves played by the environment; this plan has to obey the constraints specified by the arena.

### **4.1 An introduction to game semantics based on event structures**

There are many formulations of game semantics in the literature, with varying advantages. This paper proposes to use concurrent games, based on event structures, for reasoning about data flow in probabilistic programs. Originally introduced in [51] (though some important ideas appeared earlier: [25,44]), concurrent games based on event structures have been extensively developed and have found a range of applications.

In Sec. 2, we motivated our approach by assigning event structures to programs; these event structures are examples of strategies, which we will shortly define. First we define arenas, which are the objects of the category we will eventually build. (The morphisms will be strategies.)

Perhaps surprisingly, an arena is also defined as an event structure, though a much simpler one, with no probabilistic information, empty data dependency relation , and no neutral polarity events. We call this a **simple event structure**. This event structure does not itself represent any computation, but is simply there to constrain the shape of strategies, just as types constrain programs. Before giving the definition, we present in Fig. 7 the arenas associated with the strategies in Sec. 3.3 and Sec. 3.4, stating which types they represent. Note the **copy indices** (0, 1, . . . ) in Fig. 7b; these point to duplicated (i.e. symmetric) branches.

Fig. 7: Examples of arenas.

**Definition 9.** An *arena* is a simple, measurable event structure with symmetry <sup>A</sup> = (A, <sup>∼</sup>=A), together with two sub-symmetries <sup>∼</sup>=<sup>+</sup> <sup>A</sup> and ∼=<sup>−</sup> <sup>A</sup>, subject to the following conditions:


Write init(A) for the set of *initial events*, i.e. those minimal for ≤. We say that A is *positive* if every a ∈ init(A) is positive. (*Negative* arenas are defined similarly.) We say that A is *regular* if whenever a, b ∈ init(A), either a b or a b.

So, arenas provide a set of moves together with certain constraints for playing those moves. Our definition of strategy is slightly technical, but the various conditions ensure that strategies can be composed soundly; we will explore this second point in Sec. 4.2.

For a strategy S to be well-defined relative to an arena A, each positive or negative move of S must correspond to a move of A; however neutral moves of S correspond to internal samples of the program; these should not be constrained by the type. Accordingly, a strategy comprises a partial map S#A defined precisely on the non-neutral events. The reader should be able to reconstruct this map for the examples of Sec. 3.3 and Sec. 3.4.

**Definition 10.** A *strategy* on an arena A is a Bayesian event structure with symmetry S = (S, ∼=S), together with a partial function σ : S#A, whose domain of definition is exactly the subset {s ∈ S | pol(s) = 0}, and such that whenever σ(s) is defined, M(σ(s)) = M(s) and pol(σ(s)) = pol(s). This data is subject to the following additional conditions:


Condition (1) amounts to σ being a map of event structures [60]. Combined with (2) and (3), we get the usual notion of a concurrent strategy on an arena with symmetry [17]; and finally (4) is a form of -courtesy.

To these four conditions we add the following:

**Definition 11.** A strategy S is *innocent* if conflict is local: s s =⇒ [s) = [s ), and for every s ∈ S, the following conditions hold:


Innocence [33,56,16] prevents any non-local or concurrent behaviour. It is typically used to characterise "purely functional" sequential programs, i.e.those using no state or control features. Here, we use innocence as a way to confine ourselves to a simpler semantic universe. In particular we avoid the need to deal with the difficulties of combining concurrency and probability [62].

In the rest of the paper, a **Bayesian strategy** is an innocent strategy in the sense of Definition 10 and Definition 11.

### **4.2 Composition of strategies**

At this point, we have seen how to define arenas, and we have said that the event structures of Sec. 2 arise as strategies σ : S # A for an arena A. As usual in denotational semantics, these will be obtained compositionally, by induction on the syntax. For this we must move to a categorical setting, in which arenas are objects and strategies are morphisms.

**Strategies as morphisms.** Before we introduce the notion of strategy from A to B we must introduce some important construction on event structures.

**Definition 12.** If A is an event structure, its *dual* A<sup>⊥</sup> is the event structure whose structure is the same as A but for polarity, which is defined at pol<sup>A</sup><sup>⊥</sup> (a) = −polA(a). (Negative moves become positive, and vice-versa, with neutral moves not affected.) For arenas, we define (A, ∼=A, ∼=<sup>−</sup> <sup>A</sup>, ∼=<sup>+</sup> <sup>A</sup>)<sup>⊥</sup> = (A⊥, ∼=A, ∼=<sup>+</sup> <sup>A</sup>, ∼=<sup>−</sup> <sup>A</sup>).

Given a family (Ai)<sup>i</sup>∈<sup>I</sup> of event structures with symmetry, we define their *parallel composition* to have events <sup>i</sup>∈<sup>I</sup> <sup>A</sup><sup>i</sup> <sup>=</sup> - <sup>i</sup>∈<sup>I</sup> <sup>A</sup><sup>i</sup> × {i} with polarity, conflict and both kinds of dependency obtained componentwise. Noticing that a configuration x ∈ C ( <sup>i</sup>∈<sup>I</sup> <sup>A</sup>i) corresponds to <sup>i</sup>∈<sup>I</sup> <sup>x</sup><sup>i</sup> where each <sup>x</sup><sup>i</sup> <sup>∈</sup> <sup>C</sup> (Ai), and x<sup>i</sup> = ∅ for all but finitely many i, we define the symmetry ∼= <sup>i</sup>∈<sup>I</sup> <sup>A</sup><sup>i</sup> to contain bijections <sup>i</sup> θ<sup>i</sup> : <sup>i</sup> x<sup>i</sup> ∼= <sup>i</sup> y<sup>i</sup> where each θ<sup>i</sup> ∈ ∼=<sup>A</sup><sup>i</sup> . If the A<sup>i</sup> are arenas we define the two other symmetries in the same way.

We can now define our morphisms: a **strategy from** A **to** B is a strategy on the arena A<sup>⊥</sup> 8 B, i.e. a map σ : S # A<sup>⊥</sup> 8 B. The event structure S consists of A-moves (those mapped to the A<sup>⊥</sup> component), B-moves, and internal (i.e. neutral) events. We sometimes write S : A →<sup>+</sup> B.

The purpose of the composition operation 9 which we proceed to define is therefore to produce, from a pair of strategies σ : S # A<sup>⊥</sup> 8 B and τ : T # B<sup>⊥</sup> 8 C, a strategy τ 9 σ : T 9S # A<sup>⊥</sup> 8 C. A constant feature of denotational games models is that composition is defined in two steps: interaction, in which S and T synchronise by playing matching B-moves, and hiding, where the matching pairs of events are deleted. The setting of this paper allows both σ and τ to be partial maps, so that in general there can be neutral events in both S and T ; these never synchronise, and indeed they should not be hidden, since we aim to give an account of internal sampling.

Before moving on to composition, a word of warning: the resulting structure will not be a category. Instead, arenas and strategies assemble into a weaker structure called a bicategory [6]. Bicategories have objects, morphisms, and 2 cells (morphisms between morphisms), and the associativity and identity laws are relaxed, and only need to hold up to isomorphisms. (This situation is relatively common for intensional models of non-determinism.)

**Definition 13.** Two strategies σ : S # A<sup>⊥</sup> 8 B and σ : S # A<sup>⊥</sup> 8 B are *isomorphic* if there is a bijection f : S ∼= S preserving all structure, and such that for every x ∈ C (S), the bijection with graph {(σ(s), σ (f(s))) <sup>|</sup> <sup>s</sup> <sup>∈</sup> <sup>x</sup>} is in <sup>∼</sup>=<sup>+</sup> A.

Intuitively, S and S have the same moves up to the choice of copy indices. We know from [17] that isomorphism is preserved by composition (and all other constructions), so from now on we always consider strategies up to isomorphism; then we will get a category.

**Interaction.** In what follows we assume fixed Bayesian innocent strategies S : A → B <sup>+</sup> and T : B → C <sup>+</sup> as above, and study their interaction. We have hinted at the concept of "matching events" but the more convenient notion is that of matching configurations, which we define next.

**Definition 14.** Configurations x<sup>S</sup> ∈ C (S) and x<sup>T</sup> ∈ C (T) are *matching* if there are x<sup>A</sup> ∈ C (A) and x<sup>C</sup> ∈ C (C) such that σx<sup>S</sup> 8 x<sup>C</sup> = x<sup>A</sup> 8 τx<sup>T</sup> .

There is an event structure with symmetry <sup>T</sup> <sup>S</sup> whose configurations correspond precisely to matching pairs; it is a well-known fact in game semantics that innocent strategies compose "like relations" [43,15]. Because "matching" Bmoves have a different polarity in S and T , there is an ambiguity in the polarity of some events in T S; we address this after the lemma.

**Lemma 1.** Ignoring polarity, there is, up to isomorphism, a unique event structure with symmetry <sup>T</sup> <sup>S</sup>, such that:


Furthermore, for every <sup>e</sup> <sup>∈</sup> <sup>T</sup> <sup>S</sup>, at least one of <sup>Π</sup>S(e) and <sup>Π</sup><sup>T</sup> (e) is defined.

When reasoning about the polarity of events in T S, a subtlety arises because B-moves are not assigned the same polarity in S and T . This is not surprising: polarity is there precisely to allow strategies to communicate by sending (+) and receiving (−) values; in this interaction, S and T play complementary roles. To reason about the flow of information in the event structure <sup>T</sup> <sup>S</sup> it will be important, for each B-move e of T S, to know whether it is positive in S or in T ; in other words, whether information is flowing from S to T , or vice-versa.

Accordingly, we define pol : <sup>T</sup> <sup>S</sup> → {+<sup>S</sup> , <sup>+</sup><sup>T</sup> , <sup>0</sup><sup>S</sup> , <sup>0</sup><sup>T</sup> , −}, as follows:

$$\operatorname{pol}^{\otimes}(e) = \begin{cases} +^{\mathcal{S}} \text{ (resp. } 0^{\mathcal{S}}) & \text{if } \varPi\_{\mathcal{S}}(e) \text{ is defined and } \operatorname{pol}(\varPi\_{\mathcal{S}}(e)) = + \text{ (resp. } 0) \\ +^{\mathcal{T}} \text{ (resp. } 0^{\mathcal{T}}) & \text{if } \varPi\_{\mathcal{T}}(e) \text{ is defined and } \operatorname{pol}(\varPi\_{\mathcal{T}}(e)) = + \text{ (resp. } 0), \\ - & \text{otherwise.} \end{cases}$$

Probability in the interaction. Unlike with polarity, S and T agree on what measurable space to assign to each B-move, since by the conditions on strategies, this is determined by the arena. So for each <sup>e</sup> <sup>∈</sup> <sup>T</sup> <sup>S</sup> we can set <sup>M</sup>(e) = M(ΠS(e)) or M(Π<sup>T</sup> (e)), unambiguously, and an easy argument shows that this makes <sup>T</sup> <sup>S</sup> a well-defined measurable event structure with symmetry.

We can turn <sup>T</sup> <sup>S</sup> into a quantitative event structure by defining a kernel k <sup>e</sup> : <sup>M</sup>(pa(e)) <sup>M</sup>(e) for every <sup>e</sup> <sup>∈</sup> <sup>T</sup> <sup>S</sup> such that pol(e) <sup>=</sup> <sup>−</sup>. The key observation is that when pol(e) ∈ {+<sup>S</sup> , <sup>0</sup><sup>S</sup> }, the parents of <sup>e</sup> correspond precisely to the parents of ΠS(e) in S. Since Π<sup>S</sup> preserves the measurable space associated to an event, we may then take k <sup>e</sup> = k<sup>Π</sup><sup>S</sup> (e).

**Hiding.** Hiding is the process of deleting the B-moves from T S, yielding a strategy from A to C. The B-moves are exactly those on which both projections are defined, so the new set of events is obtained as follows:

<sup>T</sup> <sup>9</sup> <sup>S</sup> <sup>=</sup> {<sup>e</sup> <sup>∈</sup> <sup>T</sup> <sup>S</sup> <sup>|</sup> <sup>Π</sup>S(e) and <sup>Π</sup><sup>T</sup> (e) are not both defined}.

This set inherits a preorder ≤, conflict relation #, and measurable structure directly from T S. Polarity is lifted from either S or T via the projections. (Note that by removing the B-moves we resolved the mismatch.) To define the data flow dependency, we must take care to ensure that the resulting T 9 S is Bayesian. For e, e <sup>∈</sup> <sup>T</sup> <sup>9</sup> <sup>S</sup>, we say <sup>e</sup> <sup>e</sup> if one of the following holds:


From a configuration x ∈ C (T 9 S) we can recover the hidden moves to get an **interaction witness** <sup>x</sup> <sup>=</sup> {<sup>e</sup> <sup>∈</sup> <sup>T</sup> <sup>S</sup> <sup>|</sup> <sup>e</sup> <sup>≤</sup> <sup>e</sup> <sup>∈</sup> <sup>x</sup>}, a configuration of <sup>C</sup> (<sup>T</sup> <sup>S</sup>). For x, y <sup>∈</sup> <sup>C</sup> (<sup>T</sup> <sup>9</sup> <sup>S</sup>), a bijection <sup>θ</sup> : <sup>x</sup> <sup>∼</sup><sup>=</sup> <sup>y</sup> is in <sup>∼</sup>=<sup>T</sup> <sup>S</sup> if there is θ : x ∼=<sup>T</sup> <sup>S</sup> y which restricts to θ. This gives a measurable event structure with symmetry T 9S.

To make T 9S a Bayesian event structure, we must define for every e ∈ T 9S a kernel ke, which we denote k <sup>e</sup> to emphasise the difference with the kernel <sup>k</sup> e defined above. Indeed the parents pa(e) of e in T S may no longer exist in T 9 S, where e has a different set of parents pa (e).

We therefore consider the subset of hidden ancestors of **e** which ought to affect the kernel k e :

**Definition 15.** For strategies S : A → B <sup>+</sup> and T : B → C <sup>+</sup> , and e ∈ T 9 S, an *essential hidden ancestor* of <sup>e</sup> is a <sup>B</sup>-move <sup>d</sup> <sup>∈</sup> <sup>T</sup> <sup>S</sup>, such that <sup>d</sup> <sup>≤</sup> <sup>e</sup> and one of the following holds:


Since T 9S is innocent, e has a sequential history, and thus the set of essential hidden ancestors of e forms a finite, total preorder, for which there exists a linear enumeration d<sup>1</sup> ≤···≤ dn. We then define k <sup>e</sup> : <sup>M</sup>(pa(e)) M(e) as follows:

$$k\_e^{\odot}(\underline{\mathbf{p}a^{\odot}}(e), U) = \int\_{\underline{d}\_1} k(\underline{\mathbf{p}a^{\oplus}}(d\_1), \mathbf{d}\underline{d}\_1) \cdots \int\_{\underline{d}\_n} k(\underline{\mathbf{p}a^{\oplus}}(d\_n), \mathbf{d}\underline{d}\_n) \left[k\_e^{\oplus}(\underline{\mathbf{p}a^{\oplus}}(e), U)\right]$$

where we abuse notation: using that for every <sup>i</sup> <sup>≤</sup> <sup>n</sup>, pa(di) <sup>⊆</sup> pa (e) ∪ {d<sup>j</sup> <sup>|</sup> j<i}, we may write pa(di) for the only element of <sup>M</sup>(pa(di)) compatible with pa (e) and <sup>d</sup>1,...,d<sup>i</sup>−<sup>1</sup>. The particular choice of linear enumeration does not matter by Fubini's theorem for s-finite kernels.

**Lemma 2.** There is a map τ 9 σ : T 9S # A<sup>⊥</sup> 8 C making T 9S a Bayesian strategy. We call this the *composition* of S and T .

**Copycat.** We have defined morphisms between arenas, and how they compose. We now define identities, called copycat strategies. In the semantics of our language, these are used to interpret typing judgements of the form x : A x : A, and the copycat acts by forwarding values received on one side across to the other. To guide the intuition, the copycat strategy for the game -<sup>R</sup> -<sup>R</sup> is pictured in Fig. 8. (We will define the construction later.)

Fig. 8: The arena -<sup>R</sup> -<sup>R</sup> (a), and the copycat strategy on it (b).

Formally, the copycat strategy on an arena A is a Bayesian event structure (with symmetry) CCA, together with a (total) map cc<sup>A</sup> : CC<sup>A</sup> → A<sup>⊥</sup> 8 A. As should be clear in the example of Fig. 8, the events, polarity, conflict, and measurable structure of CC<sup>A</sup> are those of A<sup>⊥</sup> 8 A. The order ≤ is the transitive closure of that in A<sup>⊥</sup> 8 A enriched with the pairs {((a, 1),(a, 2)) | a ∈ A and polA(a) = +}∪{((a, 2),(a, 1)) | polA(a) = −}. The same sets of pairs also make up the data dependency relation in CCA; recall that there is no data dependency in the event structure A. Note that because CC<sup>A</sup> is just A<sup>⊥</sup> 8 A with added constraints, configurations of CC<sup>A</sup> can be seen as a subset of those of A<sup>⊥</sup> 8 A, and thus the symmetry ∼=CC<sup>A</sup> is inherited from ∼=<sup>A</sup>⊥!<sup>A</sup>.

To make copycat a Bayesian strategy, we observe that for every positive e ∈ CCA, pa(e) contains a single element, the correponding negative move in A<sup>⊥</sup> 8 A, which carries the same measurable space. Naturally, we take <sup>k</sup><sup>e</sup> : <sup>M</sup>(e) M(e) to be the identity kernel.

We have defined objects, morphisms, composition, and identities. They assemble into a category.

**Theorem 1.** Arenas and Bayesian strategies, with the latter considered up to isomorphism, form a category **BG**. **BG** has a subcategory **BG**<sup>+</sup> whose objects are positive, regular arenas and whose morphisms are *negative strategies* ( i.e. strategies whose inital moves are negative), up to isomorphism.

The restriction implies (using receptivity) that for every strategy A → B <sup>+</sup> in **BG**<sup>+</sup>, initial moves of <sup>S</sup> correspond to init(A). This reflects the dynamics of a call-by-value language, where arguments are received before anything else. We now set out to define the semantics of our language in **BG**<sup>+</sup>.

# **5 A denotational model**

In Sec. 5.1, we describe some abstract constructions in the category, which provide the necessary ingredients for interpreting types and terms in Sec. 5.2.

# **5.1 Categorical structure**

The structure required to model a calculus of this kind is fairly standard. The first games model for a call-by-value language was given by Honda and Yoshida [31] (see also [4]). Their construction was re-enacted in the context of concurrent games by Clairambault et al. [20], from whom we draw inspiration. The adaptation is not however automatic as we must account for measurability, probability, data flow, and an interpretation of product types based on coincidences.

**Coproducts.** Given arenas A and B, their **sum** A + B has events those of A 8 B, and inherited polarity, preorder, and measurable structure, but the conflict relation is extended so that <sup>a</sup> # <sup>b</sup> for every <sup>a</sup> <sup>∈</sup> <sup>A</sup> and <sup>b</sup> <sup>∈</sup> <sup>B</sup>. The symmetries <sup>∼</sup>=<sup>A</sup>+<sup>B</sup>, <sup>∼</sup>=<sup>−</sup> <sup>A</sup>+<sup>B</sup> and ∼=<sup>+</sup> <sup>A</sup>+<sup>B</sup> are restricted from ∼=<sup>A</sup>!<sup>B</sup>, ∼=<sup>−</sup> <sup>A</sup>!<sup>B</sup> and <sup>∼</sup>=<sup>+</sup> <sup>A</sup>!<sup>B</sup>.

The arena <sup>A</sup> <sup>+</sup> <sup>B</sup> is a coproduct of <sup>A</sup> and <sup>B</sup> in **BG**<sup>+</sup>. This means that there are injections ι<sup>A</sup> : A → A <sup>+</sup> + B and ι<sup>B</sup> : B → A <sup>+</sup> + B behaving as copycat on the appropriate component, and that any two strategies σ : A → C <sup>+</sup> and τ : B → C <sup>+</sup> induce a unique co-pairing strategy denoted [σ, τ ] : A+B → C <sup>+</sup> . This construction can be performed for any arity, giving coproducts <sup>i</sup>∈<sup>I</sup> <sup>A</sup>i.

**Tensor.** Tensor products are more subtle, partly because in this paper we use coincidence to deal with pairs, as motivated in Sec. 3.3. For example, given two arenas each having a single initial move, we construct their tensor product by taking their parallel composition and making the two initial moves coincident.

Fig. 9: Example of tensor construction.

More generally, suppose A and B are arenas in which all inital events are coincident; we call these **elementary arenas**. Then A⊗B has all structure inherited from A 8 B, and additionally we set a b for every a ∈ init(A) and b ∈ init(B). Since C (A ⊗ B) ⊆ C (A 8 B), we can define symmetries on A ⊗ B by restricting those in A 8 B.

Now, because arenas in **BG**<sup>+</sup> are regular (Definition 9), it is easy to see that each A is isomorphic to a sum <sup>i</sup>∈<sup>I</sup> <sup>A</sup><sup>i</sup> with each <sup>A</sup><sup>i</sup> elementary. If B ∈ **BG**<sup>+</sup> is isomorphic to <sup>j</sup>∈<sup>J</sup> <sup>B</sup><sup>j</sup> with the <sup>B</sup><sup>j</sup> elementary, we define A⊗B <sup>=</sup> i,j A<sup>i</sup> ⊗ B<sup>j</sup> .

In order to give semantics to pairs of terms, we must define the action of ⊗ of strategies. Consider two strategies σ : S # A<sup>⊥</sup> 8 A and τ : T # B<sup>⊥</sup> 8 B . Let σ 8 τ : S8T # (A8B)<sup>⊥</sup> 8 (A 8 B ) be defined in the obvious way from σ and τ (note the codomain was rearranged). We observe that C ((A⊗B)<sup>⊥</sup> 8 (A ⊗ B )) ⊆ C ((A8B)<sup>⊥</sup> 8 (A 8 B )) and show:

**Lemma 3.** Up to symmetry, there is a unique event structure S ⊗ T such that C (S ⊗T) = {x ∈ C (S 8 T) | (σ 8 τ ) x ∈ C ((A⊗B)<sup>⊥</sup> 8 (A ⊗ B ))} and such that polarity, labelling, and data flow are lifted from S 8 T via a projection function S ⊗ T → S 8 T.

Informally, the strategies synchronise at the start, i.e. all initial moves are received at the same time, and they synchronise again when they are both ready to move to the A ⊗ B side for the first time.

The operations −⊗B and A⊗− on **BG**<sup>+</sup> define functors. However, as is typically the case for models of call-by-value, the tensor fails to be bifunctorial, and thus **BG**<sup>+</sup> is not monoidal but only premonoidal [50]. The unit for <sup>⊗</sup> is the arena **1** with one (positive) event () : **1**. There are "copycat-like" associativity, unit and braiding strategies, which we omit.

The failure of bifunctoriality in this setting means that for σ : A → A <sup>+</sup> and τ : B → B <sup>+</sup> , the strategy S⊗T is in general distinct from the following two strategies:

$$\mathcal{S}\otimes\_l \mathcal{T} = (\mathbf{C}\mathbf{C}\_{\mathcal{A}'}\otimes \mathcal{T})\odot(\mathcal{S}\otimes \mathbf{C}\_{\mathcal{B}}) \qquad \mathcal{S}\otimes\_r \mathcal{T} = (\mathcal{S}\otimes \mathbf{C}\_{\mathcal{B}'})\odot(\mathbf{C}\_{\mathcal{A}}\otimes \mathcal{T})$$

See Fig. 9 for an example of the ⊗ and ⊗<sup>l</sup> constructions on simple strategies. Observe that the data flow relation is not affected by the choice of tensor: this is related to our discussion of commutativity in Sec. 1.1: a commutative semantics is one that satisfies ⊗<sup>l</sup> = ⊗<sup>r</sup> = ⊗.

We will make use of the left tensor ⊗<sup>l</sup> in our denotational semantics, because it reflects a left-to-right evaluation strategy, which is standard. It will also be important that the interpretation of values lies in the **centre** of the premonoidal category, which consists of those strategies S for which S ⊗<sup>l</sup> T = S ⊗<sup>r</sup> T and T ⊗<sup>l</sup> S = T ⊗<sup>r</sup> S for every T . Finally we note that ⊗ distributes over +, in the sense that for every A, B, C the canonical strategy (A⊗B)+(A⊗C) → A⊗ <sup>+</sup> (B+C) has an inverse λ.

**Function spaces.** We now investigate the construction of arenas of the form <sup>A</sup> <sup>B</sup>. This is a linear function space construction, allowing at most one call to the argument <sup>A</sup>; in Sec. 5.1 we will construct an extended arena !(<sup>A</sup> <sup>B</sup>) permitting arbitrary usage. Given <sup>A</sup> and <sup>B</sup> we construct <sup>A</sup> <sup>B</sup> as follows. (This construction is the same as in other call-by-value game semantics, e.g. [31,20].) Recall that we can write A = <sup>i</sup>∈<sup>I</sup> <sup>A</sup><sup>i</sup> with each <sup>A</sup><sup>i</sup> an elementary arena. Then, <sup>A</sup> <sup>B</sup> has the same set of events as **<sup>1</sup>** <sup>8</sup> <sup>i</sup>∈<sup>I</sup> (A<sup>⊥</sup> <sup>i</sup> 8 B), with inherited polarity and measurable structure, but with a preorder enriched with the pairs {(λ, a) | a ∈ init(A)}∪{(ai,(i, b)) | a ∈ init(Ai), b ∈ init(B)}, where in this case we call λ the unique move of **1**.

For every strategy <sup>σ</sup> : A⊗B → C <sup>+</sup> we call <sup>Λ</sup>(σ) : <sup>A</sup> → B <sup>+</sup> <sup>C</sup> the strategy which, upon receiving an opening A-move (or coincidence) **a**, deterministically (and with no data-flow link) plays the move <sup>λ</sup> in <sup>B</sup> <sup>C</sup>, waits for Opponent to play a B-move (or coincidence) **b** and continues as σ would on input **a b**. Additionally there is for every <sup>B</sup> and <sup>C</sup> an evaluation morphism evB,<sup>C</sup> : (<sup>B</sup> C) ⊗ B → C <sup>+</sup> defined as in [20].

**Lemma 4.** For a strategy σ : A⊗B → C <sup>+</sup> , the strategy Λ(σ) is central and satisfies ev 9 (Λ(σ) ⊗ cc) = σ.

**Duplication.** We define, for every arena A, a "reusable" arena !A. Its precise purpose will become clear when we define the semantics of our language. It is helpful to start with the observation that ground type values are readily duplicable, in the sense that there is a strategy -<sup>R</sup> <sup>→</sup><sup>+</sup> -<sup>R</sup>⊗-<sup>R</sup> in **BG**. Therefore ! will have no effect on -<sup>R</sup>, but only on more sophisticated arenas (e.g. -<sup>R</sup> -<sup>R</sup>) for which no such (well-behaved) map exists. We start by studying negative arenas.

**Definition 16.** Let A be a negative arena. We define !A to be the measurable event structure !A =8<sup>i</sup>∈<sup>ω</sup> A, equipped with the following symmetries:


It can be shown that !A is a well-defined negative arena, i.e. meets the conditions of Definition 9. Observe that an elementary positive arena B corresponds precisely to a set **<sup>e</sup>** of coincident positive events, all initial for , immediately followed by a negative arena which we call B−. Followed here means that e ≤ b

Fig. 10: Constant strategies. (The copy indices i in f indicate that we have ω symmetric branches.)

for all e ∈ **e** and b ∈ B−, and we write B = **e**·B−. We define !B = **e**·!B−. Finally, recall that an arbitrary positive arena B can be written as a sum of elementary ones: B = <sup>i</sup>∈<sup>I</sup> <sup>B</sup>i. We then define !<sup>B</sup> <sup>=</sup> <sup>i</sup>∈<sup>I</sup> !Bi.

For positive A and B, a central strategy σ : A → B <sup>+</sup> induces a strategy !σ : !A →<sup>+</sup> !B, and this is functorial. The functor ! extends to a linear exponential comonad on the category with elementary arenas as objects and central strategies as morphisms (see [20] for the details of a similar construction).

**Recursion.** To interpret fixed points, we consider an ordering relation on strategies. We momentarily break our habit of considering strategies up to isomorphism, as in this instance it becomes technically inconvenient [17].

**Definition 17.** If σ : S # A and τ : T # A are strategies, we write ST if S⊆T , the inclusion map is a map of event structures, preserves all structure, including kernels, and for every s ∈ S, σ(s) = τ (s).

**Lemma 5.** Every ω-chain S<sup>0</sup> S<sup>1</sup> ... has a least upper bound <sup>i</sup>∈<sup>ω</sup> <sup>S</sup>i, given by the union - <sup>i</sup>∈<sup>ω</sup> <sup>S</sup>i, with all structure obtained by componentwise union.

There is also a least strategy ⊥ on every arena, unique up to isomorphism. We are now ready to give the semantics of our language.

### **5.2 Denotational semantics**

The interpretation of types is as follows:


() <sup>Γ</sup> <sup>=</sup> Γ <sup>w</sup><sup>Γ</sup> −−→ **<sup>1</sup>** <sup>=</sup> **1** n <sup>Γ</sup> <sup>=</sup> Γ <sup>w</sup><sup>Γ</sup> −−→ **<sup>1</sup>** <sup>n</sup> −→ N r <sup>Γ</sup> <sup>=</sup> Γ <sup>w</sup><sup>Γ</sup> −−→ **<sup>1</sup>** <sup>r</sup> −→ R f <sup>Γ</sup> <sup>=</sup> Γ <sup>w</sup><sup>Γ</sup> −−→ **<sup>1</sup>** <sup>f</sup> −→ R<sup>n</sup> <sup>→</sup> <sup>R</sup> x Γ,x:<sup>A</sup> <sup>=</sup> Γ <sup>⊗</sup> A <sup>w</sup><sup>Γ</sup> <sup>⊗</sup>cc-A −−−−−−→ **<sup>1</sup>** <sup>⊗</sup> A <sup>∼</sup><sup>=</sup> −→ A λx.M <sup>Γ</sup> <sup>=</sup> Γ <sup>h</sup><sup>Γ</sup> −−→ !Γ !<sup>Λ</sup>(MΓ,x:<sup>A</sup>) −−−−−−−−−→ !(A B) M N <sup>Γ</sup> <sup>=</sup> Γ <sup>c</sup><sup>Γ</sup> −−→ Γ <sup>⊗</sup> Γ M<sup>Γ</sup> <sup>⊗</sup>lN<sup>Γ</sup> −−−−−−−−−→ <sup>A</sup> <sup>→</sup> <sup>B</sup> <sup>⊗</sup> A <sup>ε</sup>⊗cc −−−→ (A B) <sup>⊗</sup> A ev −→ B (M,N) <sup>Γ</sup> <sup>=</sup> Γ <sup>c</sup><sup>Γ</sup> −−→ Γ <sup>⊗</sup> Γ M<sup>Γ</sup> <sup>⊗</sup>lN<sup>Γ</sup> −−−−−−−−−→ <sup>A</sup> <sup>×</sup> <sup>B</sup> **match** <sup>M</sup> **with** (x, y) <sup>→</sup> <sup>N</sup> <sup>Γ</sup> <sup>=</sup> Γ <sup>c</sup><sup>Γ</sup> −−→ Γ <sup>⊗</sup> Γ cc⊗M<sup>Γ</sup> −−−−−−→ Γ <sup>⊗</sup> <sup>A</sup> <sup>×</sup> <sup>B</sup> NΓ,x,y −−−−−−→ C **match** <sup>M</sup> **with** [**inl** <sup>x</sup> <sup>→</sup> <sup>N</sup><sup>1</sup> <sup>|</sup> **inr** <sup>x</sup> <sup>→</sup> <sup>N</sup>2] <sup>Γ</sup> <sup>=</sup> Γ <sup>c</sup><sup>Γ</sup> −−→ Γ <sup>⊗</sup> Γ cc⊗M<sup>Γ</sup> −−−−−−→ Γ <sup>⊗</sup> (A<sup>1</sup> <sup>+</sup> A<sup>2</sup>) <sup>λ</sup> −→ Γ <sup>⊗</sup> A<sup>1</sup> <sup>+</sup> Γ <sup>⊗</sup> A<sup>2</sup> [N1,N2] −−−−−−−→ B <sup>M</sup> <sup>=</sup>? <sup>0</sup> <sup>Γ</sup> : Γ M<sup>Γ</sup> −−−−→ N 0? −→ B **score** <sup>M</sup> <sup>Γ</sup> <sup>=</sup> Γ M<sup>Γ</sup> −−−−→ R **score** −−−→ **1 sample dist**(M1,...,Mn) <sup>=</sup> Γ <sup>c</sup><sup>Γ</sup> −−→ Γ <sup>⊗</sup> ... <sup>⊗</sup> Γ M1<sup>Γ</sup> <sup>⊗</sup>l...⊗lMn<sup>Γ</sup> −−−−−−−−−−−−−−→ R<sup>n</sup> dist −−−→ X μx.M <sup>Γ</sup> = <sup>i</sup>∈<sup>ω</sup>M Γ,x <sup>i</sup> (⊥), where M Γ,x <sup>0</sup> (⊥) = <sup>⊥</sup> and M Γ,x <sup>n</sup>+1(⊥) = M Γ,x ) (cc<sup>Γ</sup> <sup>⊗</sup> M Γ,x <sup>n</sup> (⊥)) ) c<sup>Γ</sup>

Fig. 11: Interpretation of terms as strategies.

$$\begin{array}{c} \begin{bmatrix} A+B \end{bmatrix} = \begin{bmatrix} A \end{bmatrix} + \begin{bmatrix} B \end{bmatrix} \quad \begin{bmatrix} A \times B \end{bmatrix} = \begin{bmatrix} A \end{bmatrix} \otimes \begin{bmatrix} B \end{bmatrix} \quad \begin{bmatrix} A \rightarrow B \end{bmatrix} = !(\begin{bmatrix} A \end{bmatrix} \multicolumn{\circ}{\sim}{\circlearrowleft \left[ \begin{bmatrix} B \end{bmatrix} \right]}) \end{array}$$

This interpretation extends to contexts via -· <sup>=</sup> **<sup>1</sup>** and <sup>x</sup><sup>1</sup> : <sup>A</sup>1,...,x<sup>n</sup> : <sup>A</sup><sup>n</sup> <sup>=</sup> <sup>A</sup><sup>1</sup> <sup>⊗</sup> ... <sup>⊗</sup> -<sup>A</sup><sup>n</sup>. (In Fig. <sup>7</sup> we used -<sup>Γ</sup> <sup>A</sup> to refer to the arena -<sup>Γ</sup><sup>⊥</sup> <sup>8</sup> -<sup>A</sup>.)


**Lemma 6.** For a value <sup>Γ</sup> <sup>V</sup> : <sup>A</sup>, the strategy -<sup>V</sup> <sup>Γ</sup> is central.

The semantics is sound for the usual call-by-value equations.

**Proposition 1.** For arbitrary terms M, P, N1, N<sup>2</sup> and values V,W,

$$\begin{aligned} \left\lbrack \left( \lambda x.M \right) V \right\rbrack^{\Gamma} &= \left\lbrack M[V/x] \right\rbrack^{\Gamma} \\ \left\lbrack \mathbf{match}\left(V,W\right) \text{ with } (x,y) \to P \right\rbrack^{\Gamma} &= \left\lbrack P[V/x][W/y] \right\rbrack^{\Gamma} \\ \left\lbrack \mathbf{match}\ \mathbf{in}\nolimits \, V \text{ with } \left[ \mathbf{in}\nolimits \, x \to N\_{1} \mid \mathbf{in}\nolimits \, x \to N\_{2} \right] \right\rbrack^{\Gamma} &= \left\lbrack N\_{1}[V/x] \right\rbrack^{\Gamma} \end{aligned}$$

The equations are directly verified. Standard reasoning principles apply given the categorical structure we have outlined above. (It is well known that premonoidal categories provide models for call-by-value [50], and our interpretation is a version of Girard's translation of call-by-value into linear logic [29].)

# **6 Conclusion and perspectives**

We have defined, for every term <sup>Γ</sup> <sup>M</sup> : <sup>A</sup>, a strategy -<sup>M</sup><sup>Γ</sup> . This gives a model for probabilistic programming which provides an explicit representation of data flow. In particular, if M : **1**, and M has no subterm of type B + C, then the Bayesian strategy -<sup>M</sup> is a Bayesian network equipped with a total ordering of its nodes: the control flow relation ≤. Our proposed compositional semantics additionally supports sum types, higher types, and open terms.

This paper does not contain an adequacy result, largely for lack of space: the 'Monte Carlo' operational semantics of probabilistic programs is difficult to define in full rigour. In further work I hope to address this and carry out the integration of causal models into the framework of [53]. The objective remains to obtain proofs of correctness for existing and new inference algorithms.

Related work on denotational semantics. Our representation of data flow based on coincidences and a relation is novel, but the underlying machinery relies on existing work in concurrent game semantics, in particular the framework of games with symmetry developed by Castellan et al. [17]. This was applied to a language with discrete probability in [15], and to a call-by-name and affine language with continuous probability in [49]. This paper is the first instance of a concurrent games model for a higher-order language with recursion and continuous probability, and the first to track internal sampling and data flow.

There are other interactive models for statistical languages, e.g. by Ong and V´ak´ar [47] and Dal Lago et al. [38]. Their objectives are different: they do not address data flow (i.e.their semantics only represents the control flow), and do not record internal samples.

Prior to the development of probabilistic concurrent games, probabilistic notions of event structures were considered by several authors (see [58,1,59]). The literature on probabilistic Petri nets important related work, as Petri nets can sometimes provide finite representations for infinite event structures. Markov nets [7,2] satisfy conditional independence conditions based on the causal structure of Petri nets. More recently Bruni et al. [12,13] relate a form of Petri nets to Bayesian networks and inference, though their probability spaces are discrete.

Related work on graphical representations. Our event structures are reminiscent of Jeffrey's graphical language for premonoidal categories [35], which combines string diagrams [36] with a control flow relation. Note that in event structures the conflict relation provides a model for sum types, which is difficult to obtain in Jeffrey's setting. The problem of representing sum types arises also in probabilistic modelling, because Bayesian networks do not support them: [45] propose an extended graphical language, which could serve to interpret first-order probabilistic programs with conditionals. Another approach is by [42], whose Bayesian networks have edges labelled by predicates describing the branching condition. Finally, the theory of Bayesian networks has also been investigated extensively by Jacobs [34] with a categorical viewpoint. It will be important to understand the formal connections between our work and the above.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Temporal Refinements for Guarded Recursive Types**

Guilhem Jaber<sup>1</sup> and Colin Riba<sup>2</sup> (-)

Universit´e de Nantes, LS2N CNRS, Inria, Nantes, France guilhem.jaber@univ-nantes.fr Univ Lyon, EnsL, UCBL, CNRS, LIP, F-69342, Lyon Cedex 07, France colin.riba@ens-lyon.fr

**Abstract.** We propose a logic for temporal properties of higher-order programs that handle infinite objects like streams or infinite trees, represented via coinductive types. Specifications of programs use safety and liveness properties. Programs can then be proven to satisfy their specification in a compositional way, our logic being based on a type system. The logic is presented as a refinement type system over the guarded λ-calculus, a λ-calculus with guarded recursive types. The refinements are formulae of a modal μ-calculus which embeds usual temporal modal logics such as LTL and CTL. The semantics of our system is given within a rich structure, the topos of trees, in which we build a realizability model of the temporal refinement type system.

**Keywords:** coinductive types, guarded recursive types, μ-calculus, refinement types, topos of trees.

# **1 Introduction**

Functional programming is by now well established to handle infinite data, thanks to declarative definitions and equational reasoning on high-level abstractions, in particular when infinite objects are represented with coinductive types. In such settings, programs in general do not terminate, but are expected to compute a part of their output in finite time. For example, a program expected to generate a stream should produce the next element in finite time: it is productive.

Our goal is to prove input-output temporal properties of higher-order programs that handle coinductive types. Logics like LTL, CTL or the modal μcalculus are widely used to formulate, on infinite objects, safety and liveness properties. Safety properties state that some "bad" event will not occur, while liveness properties specify that "something good" will happen (see e.g. [9]). Typically, modalities like ✷ (always) or ✸ (eventually) are used to write properties of streams or infinite trees and specifications of programs over such data.

We consider temporal refinement types {A | ϕ}, where A is a standard type of our programming language, and ϕ is a formula of the modal μ-calculus. Using

This work was partially supported by the ANR-14-CE25-0007 - RAPIDO and by the LABEX MILYON (ANR-10-LABX-0070) of Universit´e de Lyon.

c The Author(s) 2021

N. Yoshida (Ed.): ESOP 2021, LNCS 12648, pp. 548–578, 2021. https://doi.org/10.1007/978-3-030-72019-3 20

refinement types [22], temporal connectives are not reflected in the programming language, and programs are formally independent from the shape of their temporal specifications. One can thus give different refinement types to the same program. For example, the following two types can be given to the same map function on streams:

$$\begin{array}{c} \mathsf{map}: (\{B \mid \psi\} \to \{A \mid \varphi\}) \longrightarrow \{\mathsf{Str}\, B \mid \Box \Diamond [\mathsf{hd}]\psi\} \longrightarrow \{\mathsf{Str}\, A \mid \Box \Diamond [\mathsf{hd}]\varphi\} \\\mathsf{map}: (\{B \mid \psi\} \to \{A \mid \varphi\}) \longrightarrow \{\mathsf{Str}\, B \mid \Diamond \Box [\mathsf{hd}]\psi\} \longrightarrow \{\mathsf{Str}\, A \mid \Diamond \Box [\mathsf{hd}]\varphi\} \end{array} (\*)$$

These types mean that given f : B → A s.t. f(b) satisfies ϕ if b satisfies ψ, the function (map f) takes a stream with infinitely many (resp. ultimately all) elements satisfying ψ to one with infinitely many (resp. ultimately all) elements satisfying ϕ. For ϕ a formula over A, [hd]ϕ is a formula over streams of A's which holds on a given stream if ϕ holds on its head element.

It is undecidable whether a given higher-order program satisfies a given inputoutput temporal property written with formulae of the modal μ-calculus [41]. Having a type system is a partial workaround to this obstacle, which moreover enables to reason compositionally on programs, by decomposing a specification to the various components of a program in order to prove its global specification.

Our system is built on top of the guarded λ-calculus [18], a higher-order programming language with guarded recursion [52]. Guarded recursion is a simple device to control and reason about unfoldings of fixpoints. It can represent coinductive types [50] and provides a syntactic compositional productivity check [5].

Safety properties (e.g. ✷[hd]ϕ) can be correctly represented with guarded fixpoints, but not liveness properties (e.g. ✸[hd]ϕ, ✸✷[hd]ϕ, ✷✸[hd]ϕ). Combining liveness with guarded recursion is a challenging problem since guarded fixpoints tend to have unique solutions. Existing approaches to handle temporal types in presence of guarded recursion face similar difficulties. Functional reactive programming (FRP) [21] provides a Curry-Howard correspondence for temporal logics [32,33,17] in which logical connectives are reflected as programming constructs. When combining FRP with guarded recursion [44,7], and in particular to handle liveness properties [8], uniqueness of guarded fixpoints is tempered by specific recursors for temporal types.

Our approach is different from [8], as we wish as much as possible the logical level not to impact the program level. We propose a two level system, with the lower or internal level, which interacts with guarded recursion and at which only safety properties are correctly represented, and the higher or external one, at which liveness properties are correctly handled, but without direct access to guarded recursion. By restricting to the alternation-free modal μ-calculus, in which fixpoints can always be computed in ω-steps, one can syntactically reason on finite unfoldings of liveness properties, thus allowing for crossing down the safety barrier. Soundness is proved by a realizability interpretation based on the semantics of guarded recursion in the topos of trees [13], which correctly represents the usual set-theoretic final coalgebras of polynomial coinductive types [50].

We provide example programs involving linear structures (colists, streams, fair streams [17,8]) and branching structures (resumptions `a la [44]), for which

```
Consg := λx.λs. fold(x, s) : A →  Strg A → Strg A
  hdg := λs.π0(unfold s) : Strg A → A
   tlg := λs.π1(unfold s) : Strg A →  Strg A
mapg := λf.fix(g).λs. Consg (f(hdg s)) (g  (tlg s)) : (B → A) → Strg B → Strg A
```
**Fig. 1.** Constructor, Destructors and Map on Guarded Streams.

we prove liveness properties similar to () above. Our system also handles safety properties on breadth-first (infinite) tree traversals `a la [35] and [10].

**Organization of the paper.** We give an overview of our approach in §2. Then §3 presents the syntax of the guarded λ-calculus. Our base temporal logic (without liveness) is introduced in §4, and is used to define our refinement type system in §5. Liveness properties are handled in §6. The semantics is given in §7, and §8 presents examples. Finally, we discuss related work in §9 and future work in §10. Table 4 (§8) gathers the main refinement types we can give to example functions, most of them defined in Table 3. Omitted material is available in [28].

# **2 Outline**

**Overview of the Guarded** *λ***-Calculus.** Guarded recursion enforces productivity of programs using a type system equipped with a type modality , in order to indicate that one has access to a value not right now but only "later". One can define guarded streams Str<sup>g</sup> A over a type A via the guarded recursive definition Str<sup>g</sup> <sup>A</sup> <sup>=</sup> <sup>A</sup> <sup>×</sup> Str<sup>g</sup> <sup>A</sup>. Streams that inhabit this type have their head available now, but their tail only one step in the future. The type modality is reflected in programs with the next operation. One also has a fixpoint constructor on terms fix(x).M for guarded recursive definitions. They are typed with

$$\frac{\mathcal{E} \vdash M : A}{\mathcal{E} \vdash \mathtt{next}(M) : \blacktriangleright A} \qquad\qquad \frac{\mathcal{E}, x : \blacktriangleright A \vdash M : A}{\mathcal{E} \vdash \mathtt{fix}(x).M : A}$$

This allows for the constructor and basic destructors on guarded streams to be defined as in Fig. 1, where fold(−) and unfold(−) are explicit operations for folding and unfolding guarded recursive types. In the following, we use the infix notation a ::<sup>g</sup> s for Cons<sup>g</sup> a s. Using the fact that the type modality is an applicative functor [49], we can distribute over the arrow type. This is represented in the programming language by the infix applicative operator . With it, one can define the usual map function on guarded streams as in Fig. 1.

**Compositional Safety Reasoning on Streams.** Given a property ϕ on a type A, we would like to consider a subtype of Str<sup>g</sup> A that selects those streams whose elements all satisfy ϕ. To do so, we use a temporal modal formula ✷[hd]ϕ,


**Table 1.** Syntactic Classes and Judgments.

and consider the refinement type {Str<sup>g</sup> <sup>A</sup> <sup>|</sup> ✷[hd]ϕ}. Suppose for now that we can give the following refinement types to the basic stream operations:

$$\begin{array}{c} \mathsf{hd}^{\mathfrak{g}} : \{\mathsf{Str}^{\mathfrak{g}} A \mid \Box[\mathsf{hd}] \varphi\} \longrightarrow \{A \mid \varphi\} \\ \mathsf{tl}^{\mathfrak{g}} : \{\mathsf{Str}^{\mathfrak{g}} A \mid \Box[\mathsf{hd}] \varphi\} \longrightarrow \mathsf{H} \{\mathsf{Str}^{\mathfrak{g}} A \mid \Box[\mathsf{hd}] \varphi\} \\ \mathsf{Cons}^{\mathfrak{g}} : \{A \mid \varphi\} \longrightarrow \mathsf{H} \{\mathsf{Str}^{\mathfrak{g}} A \mid \Box[\mathsf{hd}] \varphi\} \longrightarrow \{\mathsf{Str}^{\mathfrak{g}} A \mid \Box[\mathsf{hd}] \varphi\} \end{array}$$

By using the standard typing rules for λ-abstraction and application, together with the rules to type fix(x).M and , we can type the function map<sup>g</sup> as

$$\mathsf{Hom}\mathsf{g}^{\mathsf{g}} \colon (\{B \mid \psi\} \to \{A \mid \varphi\}) \longrightarrow \{\mathsf{Str}^{\mathsf{g}} B \mid \Box[\mathsf{hd}]\psi\} \longrightarrow \{\mathsf{Str}^{\mathsf{g}} A \mid \Box[\mathsf{hd}]\varphi\}.$$

**A Manysorted Temporal Logic.** Our logical language, taken with minor adaptations from [30], is manysorted: for each type A we have formulae of type A (notation ϕ : A), where ϕ selects inhabitants of A.

We use atomic modalities ([πi], [fold], [next],...) in refinements to navigate between types (see Fig. 5, §4). For instance, a formula ϕ of type A0, specifying a property over the inhabitants of A0, can be lifted to the formula [π0]ϕ of type A<sup>0</sup> × A1, which intuitively describes those inhabitants of A<sup>0</sup> × A<sup>1</sup> whose first component satisfy ϕ. Given a formula ϕ of type A, one can define its "head lift" [hd]ϕ of type Str<sup>g</sup> A, that enforces ϕ to be satisfied on the head of the provided stream. Also, one can define a modality ; such that given a formula <sup>ψ</sup> : Str<sup>g</sup> <sup>A</sup>, the formula ;<sup>ψ</sup> : Str<sup>g</sup> <sup>A</sup> enforces <sup>ψ</sup> to be satisfied on the tail of the provided stream. These modalities are obtained resp. as [hd]ϕ := [fold][π0]ϕ and ;ϕ := [fold][π1][next]ϕ. We similarly have atomic modalities [in0], [in1] on sum types. For instance, on the type of guarded colists defined as CoList<sup>g</sup> A := Fix(X). **<sup>1</sup>** <sup>+</sup> <sup>A</sup> <sup>×</sup> X, we can express the fact that a colist is empty (resp. nonempty) with the formula [nil] := [fold][in0] (resp. [*¬*nil] := [fold][in1]).

We also provide a deduction system <sup>A</sup> <sup>ϕ</sup> on temporal modal formulae. This deduction system is used to define a subtyping relation T ≤ U between refinement types, with {<sup>A</sup> <sup>|</sup> <sup>ϕ</sup>}≤{<sup>A</sup> <sup>|</sup> <sup>ψ</sup>} when <sup>A</sup> <sup>ϕ</sup> <sup>⇒</sup> <sup>ψ</sup>. The subtyping relation thus incorporates logical reasoning in the type system.

In addition, we have greatest fixpoints formulae ναϕ (so that formulae can have free typed propositional variables), equipped with Kozen's reasoning principles [43]. In particular, we can form an always modality as ✷ϕ := να. ϕ∧;α, with ✷<sup>ϕ</sup> : Str<sup>g</sup> <sup>A</sup> if <sup>ϕ</sup> : Str<sup>g</sup> <sup>A</sup>. The formula ✷<sup>ϕ</sup> holds on a stream <sup>s</sup> = (s<sup>i</sup> <sup>|</sup> <sup>i</sup> <sup>≥</sup> 0), iff ϕ holds on every substream (s<sup>i</sup> | i ≥ n) for n ≥ 0. If we rather start with ψ : A, one first need to lift it to [hd]ψ : Str<sup>g</sup> A. Then ✷[hd]ψ means that all the elements of the stream satisfies ψ, since all its suffixes satisfy [hd]ψ.

Table 1 summarizes the different judgments used in this paper.

**Beyond Safety.** In order to handle liveness properties, we also need to have least fixpoints formulae μαϕ. For example, this would give the eventually modality ✸ϕ := μα. ϕ∨;α. With Kozen-style rules, one could then give the following two types to the guarded stream constructor:

$$\begin{array}{c} \mathsf{Cons}^{\mathsf{g}} : \{A \mid \varphi\} \longrightarrow \mathsf{Str}^{\mathsf{g}} \\ \mathsf{Cons}^{\mathsf{g}} : A \longrightarrow \mathsf{Str}^{\mathsf{g}} \\ \end{array} \\ \begin{array}{c} \mathsf{Str}^{\mathsf{g}} : A \longrightarrow \{\mathsf{Str}^{\mathsf{g}} \, A \mid \diamondsuit[\mathsf{hd}] \varphi\} \\ \end{array} \longrightarrow \begin{array}{c} \{\mathsf{Str}^{\mathsf{g}} \, A \mid \diamondsuit[\mathsf{hd}] \varphi\} \\ \end{array}$$

But consider a finite base type B with two distinguished elements a, b, and suppose that we have access to a modality [b] on B so that terms inhabiting {B | [b]} must be equal to b. Using the above types for Cons<sup>g</sup> , we could type the stream with constant value <sup>a</sup>, defined as fix(s).<sup>a</sup> ::<sup>g</sup> <sup>s</sup>, with the type {Str<sup>g</sup> <sup>B</sup> <sup>|</sup> ✸[hd][b]} that is supposed to enforce the existence of an occurrence of b in the stream. Similarly, on colists we would have fix(s).<sup>a</sup> ::<sup>g</sup> <sup>s</sup> of type {CoList<sup>g</sup> <sup>B</sup> <sup>|</sup> ✸[nil]}, while ✸[nil] expresses that a colist will eventually contain a nil, and is thus finite. Hence, liveness properties may interact quite badly with guarded recursion. Let us look at this in a semantic model of guarded recursion.

*Internal* **Semantics in the Topos of Trees.** The types of the guarded λcalculus can be interpreted as sequences of sets (X(n))n><sup>0</sup> where X(n) represents the values available "at time n". In order to interpret guarded recursion, one also needs to have access to functions r<sup>X</sup> <sup>n</sup> : X(n + 1) → X(n), which tell how values "at n+1" can be restricted (actually most often truncated) to values "at n". This means that the objects used to represent types are in fact presheaves over the poset (<sup>N</sup> \ {0}, <sup>≤</sup>). The category <sup>S</sup> of such presheaves is the topos of trees [13]. For instance, the type Str<sup>g</sup> B of guarded streams over a finite base type B is interpreted in <sup>S</sup> as (B<sup>n</sup>)n><sup>0</sup> , with restriction maps taking (b0,..., <sup>b</sup>n−<sup>1</sup>, <sup>b</sup>n) to (b0,..., <sup>b</sup>n−<sup>1</sup>). We write -<sup>A</sup> for the interpretation of a type <sup>A</sup> in <sup>S</sup>.

**The Necessity of an** *External* **Semantics.** The topos of trees cannot correctly handle liveness properties. For instance, the formula ✸[hd][b] cannot describe in S the set of streams that contain at least one occurrence of b. Indeed, the interpretation of ✸[hd][b] in <sup>S</sup> is a sequence (C(n))n><sup>0</sup> with <sup>C</sup>(n) <sup>⊆</sup> <sup>B</sup><sup>n</sup>. But any element of B<sup>n</sup> can be extended to a stream which contains an occurrence of b. Hence C(n) should be equal to B<sup>n</sup>, and the interpretation of ✸[hd][b] is the whole -Str<sup>g</sup> <sup>B</sup>. More generally, guarded fixpoints have unique solutions in the topos of trees [13], and ✸ϕ = μα. ϕ ∨ ;ϕ gets the same interpretation as να. ϕ ∨ ;α.

We thus have a formal system with least and greatest fixpoints, that has a semantics inside the topos of trees, but which does not correctly handle least fixpoints. On the other hand, it was shown by [50] that the interpretation of guarded polynomial (i.e. first-order) recursive types in S induces final coalgebras for the corresponding polynomial functors on the category **Set** of usual sets and functions. This applies e.g. to streams and colists. Hence, it makes sense to think of interpreting least fixpoint formulae over such types externally, in **Set**.

**Fig. 2.** Internal and External Semantics

**The Constant Type Modality.** Figure 2 represents adjoint functors *Γ* : S → **Set** and *Δ* : **Set** → S. To correctly handle least fixpoints μαϕ : A, we would like to see them as subsets of *<sup>Γ</sup>*-<sup>A</sup> in **Set** rather than subobjects of -<sup>A</sup> in <sup>S</sup>. On the other hand, the internal semantics in S is still necessary to handle definitions by guarded recursion. We navigate between the internal semantics in S and the external semantics in **Set** via the adjunction *Δ* . *Γ*. This adjunction induces a comonad *ΔΓ* on S, which is represented in the guarded λ-calculus of [18] by the constant type modality . This gives coinductive versions of guarded recursive types, e.g. Str A := Str<sup>g</sup> A for streams and CoList A := CoList<sup>g</sup> A for colists, which allow for productive but not causal programs [18, Ex. 1.10.(3)].

Each formula gets two interpretations: <sup>ϕ</sup> in <sup>S</sup> and {|ϕ|} in **Set**. The external semantics {|ϕ|} handles least fixpoints in the standard set-theoretic way, thus the two interpretations differ in general. But we do have {|ϕ|} <sup>=</sup> *<sup>Γ</sup>*<sup>ϕ</sup> when <sup>ϕ</sup> is safe (Def. 6.5), that is, when ϕ describes a safety property. We have a modality [box]ϕ which lifts <sup>ϕ</sup> : <sup>A</sup> to A. By defining -[box]ϕ := *<sup>Δ</sup>* {|ϕ|}, we correctly handle the least fixpoints which are guarded by a [box] modality. When ϕ is safe, we can navigate between {<sup>A</sup> <sup>|</sup> [box]ϕ} and {<sup>A</sup> <sup>|</sup> <sup>ϕ</sup>}, thus making available the comonad structure of on [box]ϕ. Note that [box] is unrelated to ✷.

**Approximating Least Fixpoints.** For proving liveness properties on functions defined by guarded recursion, one needs to navigate between e.g. [box]✸ϕ and ✸ϕ, while ✸ϕ is in general unsafe. The fixpoint ✸ϕ = μα.ϕ ∨ ;α is alternation-free (see e.g. [16, §4.1]). This implies that ✸ϕ can be seen as the supremum of the ;<sup>m</sup><sup>ϕ</sup> for <sup>m</sup> <sup>∈</sup> <sup>N</sup>, where each ;<sup>m</sup><sup>ϕ</sup> is safe when <sup>ϕ</sup> is safe. More generally, we can approximate alternation-free μαϕ by their finite unfoldings <sup>ϕ</sup><sup>m</sup>(⊥), `a la Kleene. We extend the logic with finite iterations <sup>μ</sup><sup>k</sup>αϕ, where <sup>k</sup> is an iteration variable, and where <sup>μ</sup><sup>k</sup>αϕ is seen as <sup>ϕ</sup><sup>k</sup>(⊥). Let ✸<sup>k</sup><sup>ϕ</sup> := <sup>μ</sup><sup>k</sup>α. ϕ∨;α. If ϕ is safe then so is ✸<sup>k</sup>ϕ. For safe ϕ, ψ, we have the following refinement typings for the guarded recursive map<sup>g</sup> and its coinductive lift map:

map<sup>g</sup> : ({<sup>B</sup> <sup>|</sup> <sup>ψ</sup>}→{<sup>A</sup> <sup>|</sup> <sup>ϕ</sup>}) <sup>→</sup> # Str<sup>g</sup> B \$ \$ ✸<sup>k</sup>[hd]<sup>ψ</sup> % <sup>→</sup> # Str<sup>g</sup> A \$ \$ ✸<sup>k</sup>[hd]<sup>ϕ</sup> % map : ({B | ψ}→{A | ϕ}) → {Str B | [box]✸[hd]ψ}→{Str A | [box]✸[hd]ϕ}

# **3 The Pure Calculus**

Our system lies on top of the guarded λ-calculus of [18]. We briefly review it here. We consider values and terms from the grammar given in Fig. 3 (left). In


**Fig. 3.** Syntax and Operational Semantics of the Pure Calculus.

both boxσ(M) and prevσ(M), σ is a delayed substitution of the form σ = [x<sup>1</sup> → M1,...,x<sup>k</sup> → Mk] and such that boxσ(M) and prevσ(M) bind x1,...,x<sup>k</sup> in M. We use the following conventions of [18]: box(M) and prev(M) (without indicated substitution) stand resp. for box[](M) and prev[](M) i.e. bind no variable of M. Moreover, boxι(M) stands for box[x1"→x1,...,xk"→xk](M) where x1,...,x<sup>k</sup> is a list of all free variables of M, and similarly for prevι(M). We consider the weak call-by-name reduction of [18], recalled in Fig. 3 (right).

Pure types (notation A, B, etc.) are the closed types over the grammar

<sup>A</sup> ::= **<sup>1</sup>** <sup>|</sup> <sup>A</sup> <sup>+</sup> <sup>A</sup> <sup>|</sup> <sup>A</sup> <sup>×</sup> <sup>A</sup> <sup>|</sup> <sup>A</sup> <sup>→</sup> <sup>A</sup> <sup>|</sup> <sup>A</sup> <sup>|</sup> <sup>X</sup> <sup>|</sup> Fix(X).A <sup>|</sup> <sup>A</sup>

where, (1) in the case Fix(X).A, each occurrence of X in A must be guarded by a , and (2) in the case of A, the type A is closed (i.e. has no free type variable). Guarded recursive types are built with the fixpoint constructor Fix(X).A, which allows for X to appear in A both at positive and negative positions, but only under a . In this paper we shall only consider positive types.

Example 3.1. We can code a finite base type B = {b1,..., bn} as a sum of unit types <sup>n</sup> <sup>i</sup>=1 **1** = **1** + (··· + **1**), where the ith component of the sum is intended to represent the element b<sup>i</sup> of B. At the term level, the elements of B are represented as compositions of injections in<sup>j</sup><sup>1</sup> (in<sup>j</sup><sup>2</sup> (... in<sup>j</sup><sup>i</sup> )). For instance, Booleans are represented by Bool := **1** + **1**, with tt := in0() and ff := in1().

Example 3.2. Besides streams (Str<sup>g</sup> A), colists (CoList<sup>g</sup> A), conatural numbers (CoNat<sup>g</sup> ) and infinite binary trees (Tree<sup>g</sup> A), we consider a type Res<sup>g</sup> A of resumptions (parametrized by I, O) adapted from [44], and a higher-order recursive type Rou<sup>g</sup> A, used in Martin Hofmann's breadth-first tree traversal (see e.g. [10]):

$$\begin{array}{ll} \mathsf{Tree}^{\mathsf{g}} A := \mathsf{Fix}(X). \ A \times (\blacktriangleright X \times \blacktriangleright X) & \mathsf{CoNat}^{\mathsf{g}} := \mathsf{Fix}(X). \ \mathsf{1} + \blacktriangleright X \\\mathsf{Res}^{\mathsf{g}} A := \mathsf{Fix}(X). \ A + (\mathsf{I} \to (\mathsf{0} \times \blacktriangleright X)) & \mathsf{Rou}^{\mathsf{g}} A := \mathsf{Fix}(X). \ \mathsf{1} + ((\blacktriangleright X \to \blacktriangleright A) \to A) \end{array}$$

Some typing rules of the pure calculus are given in Fig. 4, where a pure type A is constant if each occurrence of in A is guarded by a . The omitted rules are the standard ones for simple types with finite sums and products [28, §A].

E M : A[Fix(X).A/X] E fold(M) : Fix(X).A E M : Fix(X).A E unfold(M) : A[Fix(X).A/X] E <sup>M</sup> : (<sup>B</sup> <sup>→</sup> <sup>A</sup>) E <sup>N</sup> : <sup>B</sup> E <sup>M</sup> <sup>N</sup> : <sup>A</sup> E M : A E next(M) : <sup>A</sup> <sup>x</sup><sup>1</sup> : <sup>A</sup>1,...,x<sup>k</sup> : <sup>A</sup><sup>k</sup> <sup>M</sup> : <sup>A</sup> E <sup>M</sup><sup>i</sup> : <sup>A</sup><sup>i</sup> with <sup>A</sup><sup>i</sup> constant for 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup> E prev[x1→M1,...,xk→Mk](M) : A x<sup>1</sup> : A1,...,x<sup>k</sup> : A<sup>k</sup> M : A E M<sup>i</sup> : A<sup>i</sup> with A<sup>i</sup> constant for 1 ≤ i ≤ k E box[x1→M1,...,xk→Mk](M) : <sup>A</sup> E <sup>M</sup> : <sup>A</sup> E unbox(M) : A

Example 3.3. Figure 1 defines some operations on guarded streams. On other types of Ex. 3.2, we have e.g. the constructors of colists Nil<sup>g</sup> := fold(in0) : CoList<sup>g</sup> <sup>A</sup> and Cons<sup>g</sup> := λx.λxs.fold(in1x, xs) : <sup>A</sup> <sup>→</sup> CoList<sup>g</sup> <sup>A</sup> <sup>→</sup> CoList<sup>g</sup> <sup>A</sup>. Infinite binary trees Tree<sup>g</sup> A have operations son<sup>g</sup> <sup>d</sup> : Tree<sup>g</sup> <sup>A</sup> <sup>→</sup> Tree<sup>g</sup> <sup>A</sup> for <sup>d</sup> <sup>∈</sup> {, r}, Node<sup>g</sup> : <sup>A</sup> <sup>→</sup> Tree<sup>g</sup> <sup>A</sup> <sup>→</sup> Tree<sup>g</sup> <sup>A</sup> <sup>→</sup> Tree<sup>g</sup> <sup>A</sup> and label<sup>g</sup> : Tree<sup>g</sup> <sup>A</sup> <sup>→</sup> <sup>A</sup>.

Example 3.4. Coinductive types are guarded recursive types under a . For instance Str A := Str<sup>g</sup> A, CoList A := CoList<sup>g</sup> A, CoNat := CoNat<sup>g</sup> and Res A := Res<sup>g</sup> A, with A, I, O constant. Basic operations on guarded types lift to coinductive ones. For instance

$$\begin{array}{lcl} \mathsf{Cons} := \lambda x. \lambda s. \mathsf{box}\_{\iota} \left( \mathsf{Cons}^{\mathsf{g}} \ x \ \mathsf{next} (\mathsf{unbox} \ s) \right) : A \to \mathsf{Str} \, A \to \mathsf{Str} \, A \\\mathsf{hd} := \lambda s. \mathsf{hd}^{\mathsf{g}} \left( \mathsf{unbox} \ s \right) & : \mathsf{Str} \, A \to A \\\mathsf{tl} := \lambda s. \mathsf{box}\_{\iota} \left( \mathsf{prev}\_{\iota} (\mathsf{tl}^{\mathsf{g}} \ (\mathsf{unbox} \ s)) \right) & : \mathsf{Str} \, A \to \mathsf{Str} \, A \end{array}$$

These definitions follow a general pattern to lift a function over a guarded recursive type into one over its coinductive version, by performing an η-expansion with some box and unbox inserted in the right places. For example, one can define the map function on coinductive streams as:

map := λf.λs.box<sup>ι</sup> map<sup>g</sup> f (unbox s) : (B → A) −→ Str B −→ Str A

# **4 A Temporal Modal Logic**

We present here a logic of (modal) temporal specifications. We focus on syntactic aspects. The semantics is discussed in §7. For the moment the logic has only one form of fixpoints (ναϕ). It is extended with least fixpoints (μαϕ) in §6.

**Manysorted Modal Temporal Formulae.** The main ingredient of this paper is the logical language we use to annotate pure types when forming refinement types. This language, that we took with minor adaptations from [30], is manysorted: for each pure type A we have formulae ϕ of type A (notation ϕ : A). The formulation rules of formulae are given in Fig. 5.

Example 4.1. Given a finite base type B = {b1,..., bn} as in Ex. 3.1, with element b<sup>i</sup> represented by in<sup>j</sup><sup>1</sup> (in<sup>j</sup><sup>2</sup> (... in<sup>j</sup><sup>i</sup> )), the formula [in<sup>j</sup><sup>1</sup> ][in<sup>j</sup><sup>2</sup> ] ... [in<sup>j</sup><sup>i</sup> ] represents the singleton subset {bk} of B. On Bool, we have the formulae [tt] := [in0] and [ff] := [in1] representing resp. tt and ff.

$$\begin{array}{c} \frac{(\alpha:A)\in\Sigma}{\Sigma\vdash\alpha:A} \qquad \frac{\Sigma\vdash\bot:A}{\Sigma\vdash\bot:A} \qquad \frac{\Sigma\vdash\varphi:A}{\Sigma\vdash A:B\vdash\varphi:A} \qquad \frac{\Sigma\vdash\varphi:A}{\Sigma,\alpha:B\vdash\varphi:A} \\\\ \frac{\Sigma\vdash\varphi:A}{\Sigma\vdash\varphi\Rightarrow\psi:A} \qquad \frac{\Sigma\vdash\varphi:A}{\Sigma\vdash\varphi\land\psi:A} \qquad \frac{\Sigma\vdash\varphi:A}{\Sigma\vdash\varphi\land\psi:A} \qquad \frac{\Sigma\vdash\varphi:A}{\Sigma\vdash\varphi\lor\psi:A} \qquad \frac{\Sigma\vdash\varphi:A}{\Sigma\vdash\varphi\lor\psi:A} \\\\ \frac{\Sigma\vdash\varphi:A\_{i}}{\Sigma\vdash[\pi\_{i}]\varphi\colonA\_{0}\times A\_{1}} \qquad \frac{\Sigma\vdash\varphi:A\_{i}}{\Sigma\vdash[\text{in}\_{i}]\varphi\colonA\_{0}\times A\_{1}} \qquad \frac{\Sigma\vdash\psi:B}{\Sigma\vdash[\text{ev}(\psi)]\varphi\colonB\to A} \\\\ \frac{\Sigma\vdash\varphi\land\left[\mathsf{Fin}(X).A/X\right]}{\Sigma\vdash[\mathsf{fof}[\varphi]\colon\mathsf{Fin}(X).A} \qquad \frac{\Sigma\vdash\varphi\mathrel{\ast}\mathrel{\ast}A}{\Sigma\vdash[\text{not}\varphi]\varphi\colon\mathsf{P}\mathrel{\ast}\mathrel{\ast}\mathrel{\ast}A} \qquad \frac{\vdash\varphi\colonA}{\vdash[\text{box}(\varphi\mathrel{\ast}\mathrel{\ast}\mathrel{\ast}\mathrel{\ast}\mathrel{\ast}\mathrel{\ast}\mathrel{\ast}\mathrel{\ast}\mathrel{\ast}\mathrel{\ast}\mathrel{\ast}\mathrel{\ast}\mathrel{\ast}\mathrel{\ast}\mathrel{\ast}\mathrel{\ast}}{\Sigma\vdash[\text{not}\varphi]\varphi\colon\mathsf{A}}}\\\\ \left(\nu\text{-\\*\*F}\right)\ \frac{\Sigma\mathbin{\mathrel{\ast}$$

**Fig. 5.** Formation Rules of Formulae (where A, B are pure types).

Example 4.2. (a) The formula [hd][a] ⇒ ;[hd][b] means that if the head of a stream is a, then its second element (the head of its tail) should be b.


Formulae have fixpoints ναϕ. The rules of Fig. 5 thus allow for the formation of formulae with free typed propositional variables (ranged over by α, β, . . .), and involve contexts Σ of the form α<sup>1</sup> : A1,...,α<sup>n</sup> : An. In the formation of a fixpoint, the side condition "α guarded in ϕ" asks that each occurrence of α is beneath a [next] modality. Because we are ultimately interested in the external set-theoretic semantics of formulae, we assume a usual positivity condition of α in ϕ. It is defined with relations α Pos ϕ and α Neg ϕ (see [28, §B]). We just mention here that [ev(−)](−) is contravariant in its first argument. Note that [box]ϕ can only be formed for closed ϕ.



**Table 2.** Modal Axioms and Rules. Types are omitted in and **(C)** marks axioms assumed for <sup>c</sup> but not for . Properties of the non-atomic [hd] and \* are derived.

**Modal Theories.** Formulae are equipped with a modal deduction system which enters the type system via a subtyping relation (§5). For each pure type A, we have an intuitionistic theory <sup>A</sup> (the general case) and a classical theory <sup>A</sup> c (which is only assumed under /[box]), summarized in Fig. 6 and Table 2 (where we also give properties of the derived modalities [hd], ;). In any case, <sup>A</sup> (c) ϕ is only defined when ϕ : A (and so when ϕ has no free propositional variable).

Fixpoints ναϕ are equipped with their usual Kozen axioms [43]. The atomic modalities [πi], [fold], [next], [ini] and [box] have deterministic branching (see Fig. 12, §7). We can get the axioms of the intuitionistic (normal) modal logic **IK** [56] (see also e.g. [60,48]) for [πi], [fold] and [box] but not for [ini] nor for the intuitionistic [next]. For [next], in the intuitionistic case this is due to semantic issues with step indexing (discussed in §7) which are absent from the classical case. As for [ini], we have a logical theory allowing for a coding of finite base types as finite sum types, which allows to derive, for a finite base type B:

$$\vdash^{\mathsf{B}} \quad \bigvee\_{\mathsf{a}\in\mathsf{B}} \left( [\mathsf{a}] \quad \wedge \quad \bigwedge\_{\mathsf{b}\neq\mathsf{a}} \neg[\mathsf{b}] \right),$$

**Definition 4.4 (Modal Theories).** For each pure type A, the intuitionistic and classical modal theories <sup>A</sup> <sup>ϕ</sup> and <sup>A</sup> <sup>c</sup> ϕ (where ϕ : A) are defined by mutual induction:


For example, we have Str<sup>g</sup> <sup>A</sup> ✷<sup>ψ</sup> <sup>⇒</sup> (<sup>ψ</sup> ∧ ;✷ψ) and Str<sup>g</sup> <sup>A</sup> (<sup>ψ</sup> ∧ ;✷ψ) <sup>⇒</sup> ✷ψ.

<sup>B</sup> <sup>ψ</sup> <sup>⇒</sup> <sup>φ</sup> <sup>ϕ</sup> : <sup>A</sup> <sup>B</sup>→<sup>A</sup> [ev(φ)]ϕ ⇒ [ev(ψ)]ϕ <sup>B</sup>→<sup>A</sup> ([ev(ψ0)]ϕ ∧ [ev(ψ1)]ϕ) ⇒ [ev(ψ<sup>0</sup> ∨ ψ1)]ϕ A <sup>c</sup> ((<sup>ϕ</sup> <sup>⇒</sup> <sup>ψ</sup>) <sup>⇒</sup> <sup>ϕ</sup>) <sup>⇒</sup> <sup>ϕ</sup> (CL) <sup>A</sup> <sup>c</sup> ϕ <sup>A</sup> [box]<sup>ϕ</sup> <sup>A</sup>0+A<sup>1</sup> [in0] ∨ [in1] ∧ ¬ [in0] ∧ [in1] <sup>A</sup>0+A<sup>1</sup> ([ini]) ⇒ (¬[ini]ϕ ⇔ [ini]¬ϕ) <sup>A</sup> ναϕ ⇒ ϕ[ναϕ/α] <sup>A</sup> <sup>ψ</sup> <sup>⇒</sup> <sup>ϕ</sup>[ψ/α] <sup>A</sup> ψ ⇒ ναϕ

**Fig. 6.** Modal Axioms and Rules.

$$\begin{array}{llll} \overline{T \leq |T|} & \overline{A \leq \{A \mid \top\}} & \overline{\{A \mid \varphi\} \leq \{A \mid \psi\}} & \overline{\{\blacksquare A \mid [\mathsf{box}]\varphi\} \leq \{\blacksquare A \mid [\mathsf{box}]\psi\}} \\\\ & & & \\ \hline \overline{\{\blacksquare A \mid [\mathsf{next}]\varphi\} \equiv \blacksquare \{A \mid \varphi\} & \overline{\{B \rightarrow A \mid [\mathsf{ew}(\psi)]\varphi\} \equiv \{B \mid \psi\} \rightarrow \{A \mid \varphi\}} \end{array}$$

**Fig. 7.** Subtyping Rules (excerpt).

# **5 A Temporally Refined Type System**

Temporal refinement types (or types), notation T, U, V, etc., are defined by:

$$T, U ::= A \mid \{A \mid \varphi\} \mid T + T \mid T \times T \mid T \to T \mid \blacktriangleright T \mid \blacksquare$$

where <sup>ϕ</sup> : <sup>A</sup> and, in the case of T, the type <sup>T</sup> has no free type variable. So types are built from (closed) pure types A and temporal refinements {A | ϕ}. They allow for all the type constructors of pure types.

As a refinement type {A | ϕ} intuitively represents a subset of the inhabitants of A, it is natural to equip our system with a notion of subtyping. In addition to the usual rules for product, arrow and sum types, our subtyping relation is made of two more ingredients. The first follows the principle that our refinement type system is meant to prove properties of programs, and not to type more programs, so that (say) a type of the form {A | ϕ}→{B | ψ} is a subtype of A → B. We formalize this with the notion of underlying pure type |T| of a type <sup>T</sup>. The second ingredient is the modal theory <sup>A</sup> <sup>ϕ</sup> of §4. The subtyping rules concerning refinements are given in Fig. 7, where T ≡ U enforces both T ≤ U and U ≤ T. The full set of rules is given in [28, §C]. Notice that subtyping does not incorporate (un)folding of guarded recursive types.

Typing for refinement types is given by the rules of Fig. 8, together with the rules of §3 extended to refinement types, where T is constant if |T| is constant. Modalities [πi], [ini], [fold] and [ev(−)] (but not [next]) have introduction rules extending those of the corresponding term formers.

(Pii-I) E <sup>M</sup><sup>i</sup> : {A<sup>i</sup> <sup>|</sup> <sup>ϕ</sup>} E <sup>M</sup><sup>1</sup>−<sup>i</sup> : <sup>A</sup><sup>1</sup>−<sup>i</sup> EM0, M1 : {A<sup>0</sup> <sup>×</sup> <sup>A</sup><sup>1</sup> <sup>|</sup> [πi]ϕ} (Pii-E) E <sup>M</sup> : {A<sup>0</sup> <sup>×</sup> <sup>A</sup><sup>1</sup> <sup>|</sup> [πi]ϕ} E πi(M) : {A<sup>i</sup> | ϕ} (Ev-I) <sup>E</sup>, x : {<sup>B</sup> <sup>|</sup> <sup>ψ</sup>} <sup>M</sup> : {<sup>A</sup> <sup>|</sup> <sup>ϕ</sup>} E λx.M : {<sup>B</sup> <sup>→</sup> <sup>A</sup> <sup>|</sup> [ev(ψ)]ϕ} (Ev-E) E <sup>M</sup> : {<sup>B</sup> <sup>→</sup> <sup>A</sup> <sup>|</sup> [ev(ψ)]ϕ} E <sup>N</sup> : {<sup>B</sup> <sup>|</sup> <sup>ψ</sup>} E MN : {A | ϕ} (Fd-I) E <sup>M</sup> : {A[Fix(X).A/X] <sup>|</sup> <sup>ϕ</sup>} E fold(M) : {Fix(X).A <sup>|</sup> [fold]ϕ} (Fd-E) E <sup>M</sup> : {Fix(X).A <sup>|</sup> [fold]ϕ} E unfold(M) : {A[Fix(X).A/X] | ϕ} (Inji-E) E <sup>M</sup> : {A<sup>0</sup> <sup>+</sup> <sup>A</sup><sup>1</sup> <sup>|</sup> [ini]ϕ} E, x : {A<sup>i</sup> <sup>|</sup> <sup>ϕ</sup>} <sup>N</sup><sup>i</sup> : <sup>U</sup> <sup>E</sup>, x : <sup>A</sup><sup>1</sup>−<sup>i</sup> <sup>N</sup><sup>1</sup>−<sup>i</sup> : <sup>U</sup> E case M of (x.N0|x.N1) : U (∨-E) for i ∈ {0, 1}, E M : {A | ϕ<sup>0</sup> ∨ ϕ1} E, x : {A | ϕi} N : U E <sup>N</sup>[M/x] : <sup>U</sup> (Inji-I) E <sup>M</sup> : {A<sup>i</sup> <sup>|</sup> <sup>ϕ</sup>} E ini(M) : {A<sup>0</sup> + A<sup>1</sup> | [ini]ϕ}

$$\text{(MP)} \begin{array}{c} \mathcal{E} \vdash M : \{A \mid \psi \Rightarrow \varphi\} \\ \hline \mathcal{E} \vdash M : \{A \mid \varphi\} \end{array} \quad \text{(ExF)} \begin{array}{c} \mathcal{E} \vdash M : \{A \mid \psi\} \\ \hline \mathcal{E} \vdash N : U \end{array} \begin{array}{c} \mathcal{E} \vdash N : \{A \mid \bot\} \\ \hline \mathcal{E} \vdash N : U \end{array}$$

$$\text{(Stu3)} \begin{array}{c} \mathcal{E} \vdash M : T \quad T \le U \\ \hline \mathcal{E} \vdash M : U \end{array}$$

 $\textbf{Fig.8.}\text{ Typing Rules}$  for Redined Model Types.

Example 5.1. Since <sup>ϕ</sup> <sup>⇒</sup> <sup>ψ</sup> <sup>⇒</sup> (<sup>ϕ</sup> <sup>∧</sup> <sup>ψ</sup>) and using two times the rule (MP), we get the first derived rule below, from which we can deduce the second one:

$$\begin{array}{c} \mathcal{E} \vdash M : \{A \mid \varphi\} \quad \mathcal{E} \vdash M : \{A \mid \psi\} \\ \hline \mathcal{E} \vdash M : \{A \mid \varphi \land \psi\} \end{array} \qquad \begin{array}{c} \mathcal{E} \vdash M : \{A \mid \varphi\} \quad \mathcal{E} \vdash N : \{B \mid \psi\} \\ \hline \mathcal{E} \vdash \langle M, N \rangle : \{A \times B \mid [\pi\_0] \varphi \land [\pi\_1] \psi\} \end{array}$$

Example 5.2. We have the following derived rules:

$$\frac{\mathcal{E}\vdash M:\{\mathsf{Str}^{\mathfrak{g}}A\mid\Box\varphi\}}{\mathcal{E}\vdash M:\{\mathsf{Str}^{\mathfrak{g}}A\mid\varphi\land\bigcirc\Box\varphi\}}\qquad\quad\text{and}\qquad\frac{\mathcal{E}\vdash M:\{\mathsf{Str}^{\mathfrak{g}}A\mid\varphi\land\bigcirc\Box\varphi\}}{\mathcal{E}\vdash M:\{\mathsf{Str}^{\mathfrak{g}}A\mid\Box\varphi\}}$$

Example 5.3. We have Cons<sup>g</sup> : <sup>A</sup> <sup>→</sup> {Str<sup>g</sup> <sup>A</sup> <sup>|</sup> <sup>ϕ</sup>}→{Str<sup>g</sup> <sup>A</sup> | ;ϕ} as well as tl<sup>g</sup> : {Str<sup>g</sup> <sup>A</sup> | ;ϕ} → {Str<sup>g</sup> <sup>A</sup> <sup>|</sup> <sup>ϕ</sup>}.

Example 5.4 (" Always" (✷) on Guarded Streams). The refined types of Cons<sup>g</sup> , hd<sup>g</sup> , tl<sup>g</sup> and map<sup>g</sup> mentioned in §2 are easy to derive. We also have the type

$$\{\mathsf{Str}^{\mathsf{g}} A \mid \Box[\mathsf{hd}]\varphi\_{0}\} \longrightarrow \{\mathsf{Str}^{\mathsf{g}} A \mid \Box[\mathsf{hd}]\varphi\_{1}\} \longrightarrow \{\mathsf{Str}^{\mathsf{g}} A \mid \Box([\mathsf{hd}]\varphi\_{0} \vee [\mathsf{hd}]\varphi\_{1})\}$$

for the merge<sup>g</sup> function which takes two guarded streams and interleaves them:

$$\begin{array}{rcl} \mathsf{merge\\$}: & \mathsf{Str}^{\mathsf{g}}A \longrightarrow & \mathsf{Str}^{\mathsf{g}}A \longrightarrow & \mathsf{Str}^{\mathsf{g}}A \\ := & \mathsf{fix}(g).\lambda s\_{0}.\lambda s\_{1}.\ (\mathsf{hd}^{\mathsf{g}}s\_{0}): & \mathsf{next}\Big((\mathsf{hd}^{\mathsf{g}}s\_{1}): \mathsf{:}^{\mathsf{g}}\left(g\circledast(\mathsf{tl}^{\mathsf{g}}s\_{0})\circledast(\mathsf{tl}^{\mathsf{g}}s\_{1})\right)\Big) \end{array}$$

# **6 The Full System**

The system presented so far has only one form of fixpoints in formulae (ναϕ). We now present our full system, which also handles least fixpoints (μαϕ) and thus liveness properties. A key role is played by polynomial guarded recursive types, that we discuss first.

560 G. Jaber and C. Riba

$$(\mu \text{-} \mathbf{F}) \begin{array}{c} \Sigma, \alpha : A \vdash \varphi : A\\ \hline \Sigma \vdash \mu \alpha \varphi : A \end{array} \qquad \begin{array}{c} \Sigma, \alpha : A \vdash \varphi : A\\ \hline \Sigma \vdash \mu \texttt{\t\alpha} \varphi : A \end{array} \qquad \begin{array}{c} \Sigma, \alpha : A \vdash \varphi : A\\ \hline \Sigma \vdash \nu \texttt{\t\alpha} \varphi : A \end{array}$$

**Fig. 9.** Extended Formation Rules of Formulae (with α Pos ϕ and α guarded in ϕ).

$$\begin{array}{c} \begin{array}{ccc} \dfrac{\vdash \cdot^{A} \varphi[\mu\alpha\varphi/\alpha] \Rightarrow \mu\alpha\varphi}{\vdash^{A} \mu\alpha\varphi \Rightarrow \psi\\ \dfrac{\vdash^{A} \theta^{\mathsf{t}+1}\alpha\varphi \Leftrightarrow \varphi[\theta^{\mathsf{t}}\alpha\varphi/\alpha]}{\vdash^{A} \mu\alpha\varphi \Leftrightarrow \bot} \end{array} & \begin{array}{ccc} \dfrac{\vdash^{A} \varphi[\psi/\alpha] \Rightarrow \psi\\ \dfrac{\vdash^{A} \theta^{\mathsf{t}}\alpha\varphi \Leftrightarrow \bot}{\vdash^{A} \mu\alpha\varphi \Leftrightarrow \bot} \end{array} & \begin{array}{ccc} \dfrac{\vdash^{A} \varphi[\alpha\varphi \Leftrightarrow \top\}{}\\ \dfrac{\vdash^{A} \varphi^{\mathsf{t}}\alpha\varphi \Leftrightarrow \bot}{\vdash^{A} \mu\alpha\varphi \Leftrightarrow \bot} \end{array} \end{array}$$

$$\begin{array}{ccc} \dfrac{\left[\begin{array}{ccc} \texttt{t} \texttt{t} \end{array}\right] \leq \left[\begin{array}{ccc} \mu & \texttt{\vbox{0pt}{\cdot{A} \cdot{\;{\begin{array}{c}}\;{\mathsf{t}}\alpha\varphi \Leftrightarrow \bot\\ \mathrel{{}} \cdot{{}^{A}} \mathsf{n}^{\mathsf{t}}\alpha\varphi \end{array} & \begin{array}{ccc} \dfrac{\left[\begin{array}{ccc} \mathsf{t} \end{array}\right] \geq \left[\begin{array}{ccc} \mu & \texttt{\vbox{0pt}{\cdot{A} \cdot{\;{\begin{array}{c}}\;{\begin{array}{c} \;{\begin{array}{c} \;{\begin{array}{c} \;{\begin{array}{c} \;{\begin{array}{c} \;{\begin{array}{c} \;{\begin{array}{c} \;{\begin{array}{c} \;{\begin{array}{c} \;{\begin{array}{c} \;{\begin{array}{c} \;{\begin{array}{c} \;{$$

**Fig. 10.** Extended Modal Axioms and Rules (with A a pure type and θ either μ or ν).

**Strictly Positive and Polynomial Types.** Strictly positive types (notation P <sup>+</sup>, Q<sup>+</sup>, etc.) are given by

$$P^+ ::= A \mid X \mid \blacktriangleright P^+ \mid P^+ + P^+ \mid P^+ \times P^+ \mid \mathsf{Fix}(X).P^+ \mid B \to P^+$$

where A, B are (closed) constant pure types. Strictly positive types are a convenient generalization of polynomial types. A guarded recursive type Fix(X).P(X) is polynomial if P(X) is induced by

$$P(X) ::= A \mid \begin{array}{c} \bot \\ \end{array} \mid \begin{array}{c} P(X) + P(X) \ \mid \ P(X) \times P(X) \ \mid \ B \to P(X) \end{array}$$

where A, B are (closed) constant pure types. Note that if Fix(X).P(X) is polynomial, X cannot occur on the left of an arrow (→) in P(X). We say that Fix(X).P(X) (resp. P <sup>+</sup>) is finitary polynomial (resp. finitary strictly positive) if B is a finite base type (see Ex. 3.1) in the above grammars. The set-theoretic counterpart of our polynomial recursive types are the exponent polynomial functors of [31], which all have final **Set**-coalgebras (see e.g. [31, Cor. 4.6.3]).

Example 6.1. For A a constant pure type, e.g. Str<sup>g</sup> A, CoList<sup>g</sup> A and Tree<sup>g</sup> A as well as Str<sup>g</sup> (Str A), CoList<sup>g</sup> (Str A) and Res<sup>g</sup> A (with I, O constant) are polynomial. More generally, polynomial types include all recursive types Fix(X).P(X) where P(X) is of the form <sup>n</sup> <sup>i</sup>=0 <sup>A</sup><sup>i</sup> <sup>×</sup> (X)<sup>B</sup><sup>i</sup> with <sup>A</sup>i, <sup>B</sup><sup>i</sup> constant. The nonstrictly positive recursive type Rou<sup>g</sup> A of Ex. 3.2, used in Hofmann's breadth-first traversal (see e.g. [10]), is not polynomial.

**The Full Temporal Modal Logic.** We assume given a first-order signature of iteration terms (notation t, u, etc.), with iteration variables k, , etc., and for each iteration term t(k1,...,km) with variables as shown, a given primitive recursive function <sup>t</sup> : <sup>N</sup><sup>m</sup> <sup>→</sup> <sup>N</sup>. We assume a term <sup>0</sup> for 0 <sup>∈</sup> <sup>N</sup> and a term <sup>k</sup>+<sup>1</sup> for the successor function <sup>n</sup> <sup>∈</sup> <sup>N</sup> → <sup>n</sup> + 1 <sup>∈</sup> <sup>N</sup>.

The formulae of the full temporal modal logic extend those of Fig. 5 with least fixpoints μαϕ and with approximated fixpoints μtαϕ and νtαϕ where t is an iteration term. The additional formation rule for formulae are given in Fig. 9. We use θ as a generic notation for μ and ν. Least fixpoints μαϕ are equipped with their usual Kozen axioms. In addition, iteration formulae νtαϕ(α) and μtαϕ(α) have axioms expressing that they are indeed iterations of ϕ(α) from resp. and ⊥. A fixpoint logic with iteration variables was already considered in [63].

**Definition 6.2 (Full Modal Theories).** The full intuitionistic and classical modal theories (still denoted <sup>A</sup> and <sup>A</sup> <sup>c</sup> ) are defined by extending Def. 4.4 with the axioms and rules of Fig. 10.

Example 6.3. Least fixpoints allow us to define liveness properties. On streams and colists, we have ✸ϕ := μα. ϕ ∨ ;α and ϕ U ψ := μα. ψ ∨ (ϕ ∧ ;α). On trees, we have the CTL-like ∃✸ϕ := μα. ϕ ∨ (;α ∨ ;rα) and ∀✸ϕ := μα. ϕ ∨ (;α ∧ ;rα). The formula ∃✸ϕ is intended to hold on a tree if there is a finite path which leads to a subtree satisfying ϕ, while ∀✸ϕ is intended to hold if every infinite path crosses a subtree satisfying ϕ.

Remark 6.4. On finitary trees (as in Ex. 6.1 but with Ai, B<sup>i</sup> finite base types), we have all formulae of the modal μ-calculus. For this fragment, satisfiability is decidable (see e.g. [16]), as well as the classical theory <sup>c</sup> by completeness of Kozen's axiomatization [68] (see [58] for completeness results on fragments of the μ-calculus).

**The Safe and Smooth Fragments.** We now discuss two related but distinct fragments of the temporal modal logic. Both fragments directly impact the refinement type system by allowing for more typing rules.

The safe fragment plays a crucial role, because it reconciles the internal and external semantics of our system (see §7). It gives subtyping rules for (Fig. 11), which makes available the comonad structure of on [box]ϕ when ϕ is safe.

# **Definition 6.5 (Safe Formula).** Say α<sup>1</sup> : A1,...,α<sup>n</sup> : A<sup>n</sup> ϕ : A is safe if


Note that the safe restriction imposes no condition on approximated fixpoints <sup>θ</sup><sup>t</sup>α. Recalling that the theory under a [box] is <sup>A</sup> <sup>c</sup> , the only propositional connectives accessible to <sup>A</sup> in safe formulae are those on which <sup>A</sup> and <sup>A</sup> <sup>c</sup> coincide. The formula [*¬*nil]=[fold][in1] is safe. Moreover:

Example 6.6. Any formula without fixpoint nor [ev(−)] is equivalent in <sup>c</sup> to a safe one. It ϕ is safe, then so are [hd]ϕ, [lbl]ϕ, as well as <ϕ (for <∈{✷, ∀✷, ∃✷}) and [box]<ϕ (for <∈{✸, ∃✸, ∀✸}).

**Definition 6.7 (Smooth Formula).** A formula α<sup>1</sup> : A1,...,α<sup>n</sup> : A<sup>n</sup> ϕ : A is smooth if


Our notion of alternation freedom is adapted from [16], in which propositional (fixpoint) variables are always positive. Note that the smooth restriction imposes no further conditions on approximated fixpoints θ<sup>t</sup>α. In the smooth fragment, greatest and least fixpoints can be thought about resp. as

$$\bigwedge\_{m \in \mathbb{N}} \varphi^m(\top) \qquad \text{and} \qquad \bigvee\_{m \in \mathbb{N}} \varphi^m(\bot).$$

Iteration terms allow for formal reasoning about such unfoldings. Assuming <sup>t</sup> <sup>=</sup> <sup>m</sup> <sup>∈</sup> <sup>N</sup>, the formula <sup>ν</sup><sup>t</sup>αϕ(α) (resp. <sup>μ</sup><sup>t</sup>αϕ(α)) can be read as <sup>ϕ</sup><sup>m</sup>() (resp. <sup>ϕ</sup><sup>m</sup>(⊥)). This gives the rules (ν-I) and (μ-E) (Fig. 11), which allow for reductions to the safe case (see examples in §8).

Remark 6.8. It is well-known (see e.g. [16, §4.1]) that on finitary trees (see Rem. 6.4) the alternation-free fragment is equivalent to Weak MSO (MSO with second-order variables restricted to finite sets). In the case of streams Str B (for a finite base type B), Weak MSO is in turn equivalent to the full modal μ-calculus. In particular, the alternation-free fragment contains all the flat fixpoints of [58] and thus LTL on Str B and CTL on Tree B and on Res B with I, O, B finite base types. A typical property on Tree B which cannot be expressed with alternationfree formulae is "there is an infinite path with infinitely many occurrences of b" for a fixed b : B (see e.g. [16, §2.2]).

Example 6.9. Any formula without fixpoint nor [ev(−)] is smooth. It ϕ is smooth, then so are [hd]ϕ, [lbl]ϕ and <ϕ for <∈{✷, ∀✷, ∃✷, ✸, ∃✸, ∀✸}.

**The Full System.** We extend the types of §5 with universal quantification over iteration variables (∀k · T). The type system of §5 is extended with the rules of Fig. 11.

Example 6.10. The logical rules of Fig. 10 give the following derived typing rules (where β Pos γ):

$$(\mu\text{-I})\ \frac{\mathcal{E}\vdash M:\{\blacksquare\!\perp A\mid\left[\blacksquare\!\mathbf{box}\right]\gamma[\mu^{\natural}\alpha\varphi/\beta]\}}{\mathcal{E}\vdash M:\{\blacksquare\!A\mid\left[\blacksquare\!\mathbf{box}\right]\gamma[\mu\alpha\varphi/\beta]\}}\quad(\nu\text{-E})\ \frac{\mathcal{E}\vdash M:\{\blacksquare\!A\mid\left[\blacksquare\!\mathbf{box}\right]\gamma[\nu\alpha\varphi/\beta]\}}{\mathcal{E}\vdash M:\{\blacksquare\!A\mid\left[\blacksquare\!\mathbf{box}\right]\gamma[\nu^{\natural}\alpha\varphi/\beta]\}}$$

$$\begin{array}{c} \varphi \text{ safe} \\ \{\blacksquare \!\!\!\!\!\!\!\/ [ ] \!\!\!\!\text{Box}[\varphi] \equiv \blacksquare \{ \!\!\!\!\/ [ ] \!\!\!\/ [ ] \!\!\!\/] } \qquad \overline{\forall k \cdot \black# \!\!\!\/ T \equiv \blacksquare \forall k \cdot \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \pm \lambda \kern-1.172 \$$

**Fig. 11.** Extended (Sub)Typing Rules for Refinement Types (where k is not free in E in (∀-I)&(∀-CI), is fresh in (ν-I)&(μ-E), θαψ and <sup>γ</sup> are smooth, and <sup>β</sup> Pos <sup>γ</sup>).

# **7 Semantics**

We present the main ingredients of the semantics of our type system. We take as base the denotational semantics of guarded recursion in the topos of trees.

**Denotational Semantics in the Topos of Trees.** The topos of trees S provides a natural model of guarded recursion [13]. Formally, S is the category of presheaves over (<sup>N</sup> \ {0}, <sup>≤</sup>). In words, the objects of <sup>S</sup> are indexed sets X = (X(n))n><sup>0</sup> equipped with restriction maps r<sup>X</sup> <sup>n</sup> : X(n + 1) → X(n). Excluding 0 from the indexes is a customary notational convenience ([13]). The morphisms from X to Y are families of functions f = (f<sup>n</sup> : X(n) → Y (n))n><sup>0</sup> which commute with restriction, that is <sup>f</sup><sup>n</sup> ◦r<sup>X</sup> <sup>n</sup> = r<sup>Y</sup> <sup>n</sup> ◦f<sup>n</sup>+1. As any presheaf category, S has (pointwise) limits and colimits, and is Cartesian closed (see e.g. [47, §I.6]). We write *Γ* : S → **Set** for the global section functor, which takes X to S[**1**, X], the set of morphisms **1** → X in S, where **1** = ({•})n><sup>0</sup> is terminal in S.

A typed term E M : T is to be interpreted in S as a morphism

$$[\![M]\!]: [\![\mathcal{E}]\!] \longrightarrow [\![T\!]\!]$$

where -|E| <sup>=</sup> -<sup>|</sup>T1| ×···× -<sup>|</sup>Tn| for <sup>E</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> : <sup>T</sup>1,...,x<sup>n</sup> : <sup>T</sup>n. In particular, a closed term <sup>M</sup> : <sup>T</sup> is to be interpreted as a global section -<sup>M</sup> <sup>∈</sup> *<sup>Γ</sup>*-<sup>|</sup>T|. The ×/ + / → fragment of the calculus is interpreted by the corresponding structure in <sup>S</sup>. The modality is interpreted by the functor : S→S of [13]. This functor shifts indexes by 1 and inserts a singleton set **1** at index 1. The term constructor next is interpreted by the natural map with component next<sup>X</sup> : <sup>X</sup> <sup>→</sup> <sup>X</sup> as in

```
{|[πi]ϕ|} := {x ∈ ΓA0 × A1 | πi ◦ x ∈ {|ϕ|}} {|[next]ϕ|} := {next ◦ x ∈ ΓA | x ∈ {|ϕ|}}
 {|[fold]ϕ|} := {x ∈ ΓFix(X).A | unfold ◦ x ∈ {|ϕ|}} {|[box]ϕ|} := {x ∈ ΓA | x1(•) ∈ {|ϕ|}}
  {|[ini]ϕ|} := +
               x ∈ ΓA0 + A1
                               ,
                               , ∃y ∈ ΓAi

                                            x = ini ◦ y and y ∈ {|ϕ|} -
{|[ev(ψ)]ϕ|} := +
               x ∈ ΓB → A
                              ,
                              , ∀y ∈ ΓB

                                          y ∈ {|ψ|} =⇒ ev ◦ x, y ∈ {|ϕ|} -
```
**Fig. 12.** External Semantics (for closed formulae).

The guarded fixpoint combinator fix is interpreted by the morphism fix<sup>X</sup> : <sup>X</sup><sup>X</sup> <sup>→</sup> <sup>X</sup> of [13, Thm. 2.4].

The constant type modality is interpreted as the comonad *ΔΓ* : S→S, where the left adjoint *Δ* : **Set** → S is the constant object functor, which takes a set <sup>S</sup> to the constant family (S)n>0. In words, all components -A(n) are equal to *<sup>Γ</sup>*-<sup>A</sup>, and the restriction maps of -A are identities. In particular, a global section <sup>x</sup> <sup>∈</sup> *<sup>Γ</sup>*-A is a constant family (xn)<sup>n</sup> describing a unique global section <sup>x</sup><sup>n</sup>+1(•) = <sup>x</sup>n(•) <sup>∈</sup> *<sup>Γ</sup>*-<sup>A</sup>. We refer to [18] and [28, §D] for the interpretation of prev, box and unbox. Just note that the unit η : Id**Set** → *Γ Δ* is an iso.

Together with an interpretation of guarded recursive types, this gives a denotational semantics of the pure calculus of §3. See [13,18] for details. We write fold : -<sup>A</sup>[Fix(X).A/X] <sup>→</sup> -Fix(X).A and unfold : -Fix(X).A <sup>→</sup> -<sup>A</sup>[Fix(X).A/X] for the two components of the iso -Fix(X).A <sup>=</sup> -<sup>A</sup>[Fix(X).A/X].

**External Semantics.** Møgelberg [50] has shown that for polynomial types such as Str<sup>g</sup> <sup>B</sup> with <sup>B</sup> a constant type, the set of global sections *<sup>Γ</sup>*-Str<sup>g</sup> <sup>B</sup> is equipped with the usual final coalgebra structure of streams over B in **Set**. To each polynomial recursive type Fix(X).P(X), we associate a polynomial functor P**Set** : **Set** → **Set** in the obvious way.

**Theorem 7.1 ([50] (see also [18])).** If Fix(X).P(X) is polynomial, then the set *<sup>Γ</sup>*-Fix(X).P(X) carries a final **Set**-coalgebra structure for <sup>P</sup>**Set**.

We devise a **Set** interpretation {|ϕ|} ∈ P(*Γ*-<sup>A</sup>) of formulae <sup>ϕ</sup> : <sup>A</sup>. We rely on the (complete) Boolean algebra structure of powersets for propositional connectives and on Knaster-Tarski Fixpoint Theorem for fixpoints μ and ν. The interpretations of ν<sup>t</sup>αϕ(α) and μ<sup>t</sup>αϕ(α) (for t closed) are defined to be the interpretations resp. of <sup>ϕ</sup>t() and <sup>ϕ</sup>t(⊥), where e.g. <sup>ϕ</sup><sup>0</sup>() := and <sup>ϕ</sup><sup>n</sup>+1() := <sup>ϕ</sup>(ϕ<sup>n</sup>()). We give the cases of the atomic modalities in Fig. 12 (where for simplicity we assume formulae to be closed). It can be checked that, when restricting to polynomial types, one gets the coalgebraic semantics of [30] (with sums as in [31]) extended to fixpoints.

**Internal Semantics of Formulae.** We would like to have adequacy w.r.t. the external semantics of formulae, namely that given M : {A | ϕ}, the global section -<sup>M</sup> <sup>∈</sup> *<sup>Γ</sup>*-<sup>A</sup> satisfies {|ϕ|} ∈ P(*Γ*-<sup>A</sup>) in the sense that -<sup>M</sup> ∈ {|ϕ|}. But in general we can only have adequacy w.r.t. an internal semantics <sup>ϕ</sup> <sup>∈</sup> Sub(-A) of formulae ϕ : A. We sketch it here. First, Sub(X) is the (complete) Heyting algebra of subobjects of an object X of S. Explicitly, we have S = (S(n))<sup>n</sup> ∈ Sub(X) iff for all n > 0, <sup>S</sup>(n) <sup>⊆</sup> <sup>X</sup>(n) and <sup>r</sup><sup>X</sup> <sup>n</sup> (t) ∈ S(n) whenever t ∈ S(n + 1). For propositional connectives and fixpoints, the internal -<sup>−</sup> is defined similarly as the external {|−|}, but using (complete) Heyting algebras of subobjects rather than (complete) Boolean algebras of subsets.

As for modalities, let [<] be of the form [πi], [ini], [next] or [fold], and assume [<]ϕ : B whenever ϕ : A. Standard topos theoretic constructions give posets morphisms -[<] : Sub(-<sup>A</sup>) <sup>→</sup> Sub(-<sup>B</sup>) such that -[πi], -[fold] are maps of Heyting algebras, -[ini] preserves <sup>∨</sup>, <sup>⊥</sup> and <sup>∧</sup>, while -[next] preserves <sup>∧</sup>, and <sup>∨</sup>. With -[<]ϕ := -[<](<sup>ϕ</sup>), all the axioms and rules of Table 2 are validated for these modalities. To handle guarded recursion, it is crucial to have -[next]ϕ := (<sup>ϕ</sup>), with -[next]ϕ true at time 1, independently from <sup>ϕ</sup>. As a consequence, [next] and ; do not validate axiom (P) (Table 2), and ✸[hd]ϕ can "lie" about the next time step. We let -[box]ϕ := *<sup>Δ</sup>*({|ϕ|}).

The modality [ev(ψ)] is a bit more complex. For ψ : B and ϕ : A, the formula [ev(ψ)]ϕ is interpreted as a logical predicate in the sense of [29, §9.2 & Prop. 9.2.4]. The idea is that for a term M : {B → A | [ev(ψ)]ϕ}, the global section ev ◦ -<sup>M</sup>, x ∈ *<sup>Γ</sup>*-<sup>A</sup> should satisfy <sup>ϕ</sup> whenever <sup>x</sup> <sup>∈</sup> *<sup>Γ</sup>*-<sup>B</sup> satisfies <sup>ψ</sup>. We refer to [28, §D] for details.

Our semantics are both correct w.r.t. the full modal theories of Def. 6.2.

**Lemma 7.2.** If <sup>A</sup> <sup>c</sup> <sup>ϕ</sup> then {|ϕ|} <sup>=</sup> {||}. If <sup>A</sup> <sup>ϕ</sup> then <sup>ϕ</sup> <sup>=</sup> -.

**The Safe Fragment.** For α (positive and) guarded in ϕ, the internal semantics of θαϕ is somewhat meaningless because S has unique guarded fixpoints [13, §2.5]. In particular, the typing fix(s).Cons<sup>g</sup> a s : {Str<sup>g</sup> <sup>A</sup> <sup>|</sup> ✸[ϕ]} for arbitrary <sup>a</sup> : <sup>A</sup> and <sup>ϕ</sup> : Str<sup>g</sup> <sup>A</sup> (extending §2) is indeed verified by the <sup>S</sup> semantics -−. This prevents us from adequacy w.r.t. the external semantics in general. But this is possible for safe formulae since in this case we have:

#### **Proposition 7.3.** If <sup>ϕ</sup> : <sup>A</sup> is safe then {|ϕ|} <sup>=</sup> *<sup>Γ</sup>*ϕ.

Proposition 7.3 gives the subtyping rule {<sup>A</sup> <sup>|</sup> [box]ϕ} ≡ {<sup>A</sup> <sup>|</sup> <sup>ϕ</sup>} (Fig. 11), which makes available the comonad structure of on [box]ϕ when ϕ is safe. Recall that in safe formulae, implications can only occur under a [box] modality and thus in closed subformulae. It is crucial for Prop. 7.3 that infs and sups are pointwise in the subobject lattices of S, so that conjunctions and disjunctions are interpreted as with the usual classical Kripke semantics (see e.g. [47, §VI.7]). This does not hold for implications!

The second key to Prop. 7.3 is the following. For L a complete lattice, a Scott cocontinuous function <sup>L</sup> <sup>→</sup> <sup>L</sup> is a Scott continuous function <sup>L</sup>op <sup>→</sup> <sup>L</sup>op, i.e. which preserves codirected infs. For a safe <sup>α</sup> : <sup>A</sup> <sup>ϕ</sup> : <sup>A</sup>, the poset maps ϕ : Sub(-<sup>A</sup>) <sup>→</sup> Sub(-<sup>A</sup>) and {|ϕ|} : <sup>P</sup>(*Γ*-<sup>A</sup>) → P(*Γ*-<sup>A</sup>) are Scott cocontinuous. The greatest fixpoint ναϕ(α) can thus be interpreted, both in **Set** and S, using Kleene's Fixpoint Theorem, as the infs of the interpretations of <sup>ϕ</sup><sup>m</sup>() for <sup>m</sup> <sup>∈</sup> <sup>N</sup>. This leads to the expected coincidence of the two semantics for safe formulae.

<sup>x</sup> <sup>n</sup> {<sup>A</sup> <sup>|</sup> <sup>ϕ</sup>} iff <sup>x</sup>n(•) <sup>∈</sup> ϕ <sup>A</sup>(n) <sup>x</sup> <sup>n</sup> Fix(X).A iff unfold ◦ <sup>x</sup> <sup>n</sup> <sup>A</sup>[Fix(X).A/X] <sup>x</sup> <sup>n</sup> <sup>T</sup><sup>0</sup> <sup>+</sup> <sup>T</sup><sup>1</sup> iff <sup>∃</sup><sup>i</sup> ∈ {0, <sup>1</sup>}, <sup>∃</sup><sup>y</sup> <sup>∈</sup> *<sup>Γ</sup>*|Ti|, x <sup>=</sup> in<sup>i</sup> ◦ <sup>y</sup> and <sup>y</sup> <sup>n</sup> <sup>T</sup><sup>i</sup> <sup>x</sup> <sup>n</sup> <sup>T</sup><sup>0</sup> <sup>×</sup> <sup>T</sup><sup>1</sup> iff <sup>π</sup><sup>0</sup> ◦ <sup>x</sup> <sup>n</sup> <sup>T</sup><sup>0</sup> and <sup>π</sup><sup>1</sup> ◦ <sup>x</sup> <sup>n</sup> <sup>T</sup><sup>1</sup> <sup>x</sup> <sup>n</sup> **<sup>1</sup>** <sup>x</sup> <sup>n</sup> <sup>U</sup> <sup>→</sup> <sup>T</sup> iff <sup>∀</sup><sup>k</sup> <sup>≤</sup> n, <sup>∀</sup><sup>y</sup> <sup>∈</sup> *<sup>Γ</sup>*|U|, y <sup>k</sup> <sup>U</sup> <sup>=</sup><sup>⇒</sup> ev ◦ x, y <sup>k</sup> <sup>T</sup> <sup>x</sup> <sup>n</sup>+1 <sup>T</sup> iff <sup>∃</sup><sup>y</sup> <sup>∈</sup> *<sup>Γ</sup>*|T|, x <sup>=</sup> next ◦ <sup>y</sup> and <sup>y</sup> <sup>n</sup> T x <sup>1</sup> <sup>T</sup> <sup>x</sup> <sup>n</sup> <sup>T</sup> iff <sup>∀</sup>m > <sup>0</sup>, xn(•) <sup>m</sup> <sup>T</sup> (where <sup>x</sup> <sup>∈</sup> *<sup>Γ</sup>*|T|) <sup>x</sup> <sup>n</sup> <sup>∀</sup><sup>k</sup> · <sup>T</sup> iff <sup>x</sup> <sup>n</sup> <sup>T</sup>[t/k] for all closed iteration terms <sup>t</sup>

**Fig. 13.** The Realizability Semantics.

**The Smooth Fragment.** The smooth restriction allows for continuity properties needed to compute fixpoints iteratively, following Kleene's Fixpoint Theorem. This implies the correctness of the typing rules (ν-I) and (μ-E) of Fig. 11.

**Lemma 7.4.** Given a closed smooth ναϕ(α) : A (resp. μαϕ(α) : A), the function {|ϕ|} : <sup>P</sup>(*Γ*-<sup>A</sup>) → P(*Γ*-<sup>A</sup>) is Scott-cocontinuous (resp. Scott-continuous). We have {|ναϕ(α)|} = A <sup>m</sup>∈<sup>N</sup> {|ϕ<sup>m</sup>()|} (resp. {|μαϕ(α)|} <sup>=</sup> - <sup>m</sup>∈<sup>N</sup> {|ϕ<sup>m</sup>(⊥)|}).

**The Realizability Semantics.** The correctness of the type system w.r.t. its semantics in S is proved with a realizability relation.

**Definition 7.5 (Realizability).** Given a type T without free iteration variable, a global section <sup>x</sup> <sup>∈</sup> *<sup>Γ</sup>*-<sup>|</sup>T| and n > <sup>0</sup>, we define the realizability relation <sup>x</sup> <sup>n</sup> <sup>T</sup> by induction on lexicographicaly ordered pairs (n, T) in Fig. 13.

**Lemma 7.6.** Given types T,U without free iteration variable, if x <sup>n</sup> U and <sup>U</sup> <sup>≤</sup> <sup>T</sup> then <sup>x</sup> <sup>n</sup> <sup>T</sup>.

**Theorem 7.7 (Adequacy).** If M : T, where T has no free iteration variable, then -<sup>M</sup> <sup>n</sup> <sup>T</sup> for all n > <sup>0</sup>.

By Thm. 7.7, a program <sup>M</sup> : <sup>B</sup> <sup>→</sup> <sup>A</sup> induces a set-theoretic function *<sup>Γ</sup>*-<sup>M</sup> : *Γ*-<sup>B</sup> <sup>→</sup> *<sup>Γ</sup>*-<sup>A</sup>, <sup>x</sup> → -<sup>M</sup>◦x. When <sup>B</sup> and <sup>A</sup> are polynomial (e.g. streams Str<sup>g</sup> <sup>B</sup>, Str<sup>g</sup> <sup>A</sup> with <sup>B</sup>, <sup>A</sup> constant), Møgelberg's Thm. 7.1 says that *<sup>Γ</sup>*-<sup>M</sup> is a function on the usual final coalgebra for B, A in **Set** (e.g. the set of usual streams over B and A). Moreover, if e.g. M : {Str B | [box]ψ}→{Str A | [box]ϕ}, then (modulo *Γ Δ* = Id**Set**) given a stream x that satisfies ψ (i.e. x ∈ {|ψ|}) the stream *Γ*-<sup>M</sup>(x) satisfies <sup>ϕ</sup> (i.e. *<sup>Γ</sup>*-<sup>M</sup>(x) ∈ {|ϕ|}). See §8 for examples.

# **8 Examples**

We exemplified basic manipulations of our system over §3-6. We give further examples here. The functions used in our main examples are gathered in Table 3, with the following conventions. We use the infix notation a ::<sup>g</sup> s for Cons<sup>g</sup> a s and write []<sup>g</sup> for the empty colist Nil<sup>g</sup> . Moreover, we use some syntactic sugar for pattern matching, e.g. assuming <sup>s</sup> : CoList<sup>g</sup> <sup>A</sup> we write case <sup>s</sup> of ([]<sup>g</sup> → <sup>N</sup>|<sup>x</sup> ::<sup>g</sup> xs → M) for case(unfold s) of (y.N[/y]|y.M[π0(y)/x , π1(y)/xs]). Most of the

**append** : CoList A −→ CoList A −→ CoList A := λs.λt. boxι(append<sup>g</sup> (unbox s) (unbox t)) **append<sup>g</sup>** : CoList<sup>g</sup> <sup>A</sup> <sup>→</sup> CoList<sup>g</sup> <sup>A</sup> <sup>→</sup> CoList<sup>g</sup> <sup>A</sup> := fix(g).λs.λt.case s of <sup>|</sup> []<sup>g</sup> → <sup>t</sup> <sup>|</sup> <sup>x</sup> ::<sup>g</sup> xs → <sup>x</sup> ::<sup>g</sup> (<sup>g</sup> xs (next <sup>t</sup>)) **sched** : Res A −→ Res A −→ Res A := λp.λq. boxι(sched<sup>g</sup> (unbox p) (unbox q)) **sched<sup>g</sup>** : Res<sup>g</sup> <sup>A</sup> −→ Res<sup>g</sup> <sup>A</sup> −→ Res<sup>g</sup> <sup>A</sup> := fix(g).λp.λq. case p of <sup>|</sup> Ret<sup>g</sup> <sup>a</sup> → Ret<sup>g</sup> <sup>a</sup> <sup>|</sup> Cont<sup>g</sup> <sup>k</sup> → let h = λi. let o, t = ki in o, g (next <sup>q</sup>) <sup>t</sup> in Cont<sup>g</sup> h **diag** := λs.box<sup>ι</sup> diag<sup>g</sup> (unbox s) : Str(Str A) −→ Str A **diag<sup>g</sup>** := diagaux<sup>g</sup> (λx.x) : Str<sup>g</sup> (Str <sup>A</sup>) −→ Str<sup>g</sup> <sup>A</sup> **diagaux<sup>g</sup>** : (Str <sup>A</sup> <sup>→</sup> Str <sup>A</sup>) −→ Str<sup>g</sup> (Str <sup>A</sup>) −→ Str<sup>g</sup> <sup>A</sup> := fix(g).λt.λs. Cons<sup>g</sup> (hd ◦ <sup>t</sup>)(hd<sup>g</sup> <sup>s</sup>) <sup>g</sup> next(<sup>t</sup> ◦ tl) (tl<sup>g</sup> <sup>s</sup>) **fb** : CoNat −→ CoNat −→ Str Bool := λc.λm. boxι(fb<sup>g</sup> (unbox c) (unbox m)) **fb<sup>g</sup>** : CoNat<sup>g</sup> −→ CoNat<sup>g</sup> −→ Str<sup>g</sup> Bool := fix(g).λc.λm. case c of <sup>|</sup> <sup>Z</sup><sup>g</sup> → ff ::<sup>g</sup> <sup>g</sup> (next <sup>m</sup>) next(S<sup>g</sup> (next <sup>m</sup>)) | Sg <sup>n</sup> → tt ::<sup>g</sup> <sup>g</sup> <sup>n</sup> (next <sup>m</sup>) **extract** : Rou<sup>g</sup> (CoList<sup>g</sup> <sup>A</sup>) −→ CoList<sup>g</sup> <sup>A</sup> := fix(g).λc. case c of <sup>|</sup> Over<sup>g</sup> → Nil<sup>g</sup> <sup>|</sup> Cont<sup>g</sup> <sup>f</sup> → fg **unfold** : Rou<sup>g</sup> <sup>A</sup> −→ ( Rou<sup>g</sup> <sup>A</sup> <sup>→</sup> A) −→ <sup>A</sup> := λc. case c of <sup>|</sup> Over<sup>g</sup> → λk. k (next Over<sup>g</sup> ) <sup>|</sup> Cont<sup>g</sup> f → λk. next(f k) **bftg** := λt. extract (bftaux t Over<sup>g</sup> ) : Tree<sup>g</sup> <sup>A</sup> −→ CoList<sup>g</sup> <sup>A</sup> **bftaux** : Tree<sup>g</sup> <sup>A</sup> −→ Rou<sup>g</sup> (CoList<sup>g</sup> <sup>A</sup>) −→ Rou<sup>g</sup> (CoList<sup>g</sup> A) := fix(g).λt.λc. Cont λk. (label<sup>g</sup> t) ::<sup>g</sup> unfold c <sup>k</sup> ◦ (<sup>g</sup> (son<sup>g</sup> t)) ◦ (<sup>g</sup> (son<sup>g</sup> rt))

**Table 3.** Code of the Examples.

functions of Table 3 are obtained from usual recursive definitions by inserting and next at the right places. We often write ψ 8→ ϕ for [ev(ψ)]ϕ. Table 4 recaps our main examples of refinement typings, all of which (for A, B, O, I constant, I finite and ϕ, ψ safe and smooth) can be derived syntactically for the functions of Table 3. We use intermediate typings requiring iteration terms whenever a ✸ is involved. Below, "*Γ*-<sup>M</sup> satisfies <sup>ϕ</sup>" means *<sup>Γ</sup>*-<sup>M</sup> ∈ {|ϕ|} (modulo *Γ Δ* <sup>=</sup> Id**Set**, see §7). We refer to [28, §E] for details.

Example 8.1 (The Append Function on CoLists). Our system can derive that *Γ*append returns a non-empty colist if one of its argument is non-empty. Using ✸[nil] (which says that a colist is finite), we can derive that *<sup>Γ</sup>*append returns a finite colist if its arguments are both finite. This involves the intermediate typing

$$\forall k \cdot \forall \ell \cdot \left( \left\{ \mathsf{CoList}^{\mathsf{g}} A \; \middle| \; \diamondsuit^{k} [\mathsf{nil}] \right\} \to \left\{ \mathsf{CoList}^{\mathsf{g}} A \; \middle| \; \diamondsuit^{\ell} [\mathsf{nil}] \right\} \to \left\{ \mathsf{CoList}^{\mathsf{g}} A \; \middle| \; \diamondsuit^{k+\ell} [\mathsf{nil}] \right\} \right)$$

In addition, if the first argument of *<sup>Γ</sup>*append has an element which satisfies ϕ, then the result has an element which satisfies ϕ. The same holds if the first argument is finite while the second one has an element which satisfies ϕ [28, §E.6]. " 

```
Map over coinductive streams (with + either ✷, ✸, ✸✷ or ✷✸)
     map : ({B | ψ}→{A | ϕ}) −→ {Str B | [box]+[hd]ψ} −→ {Str A | [box]+[hd]ϕ}
Diagonal of coinductive streams of streams (with + either ✷ or ✸✷)
           diag : {Str(Str A) | [box]+[hd][box]✷[hd]ϕ} −→ {Str A | [box]+[hd]ϕ}
A fair stream of Booleans (adapted from [17,8])
                  fb : CoNat −→ CoNat −→ Str Bool
                  fb 0 1 : {Str Bool | [box]✷✸[hd][tt] ∧ [box]✷✸[hd][ff]}
Append on guarded recursive colists
            appendg : {CoListg A | [¬nil]} −→ CoListg A −→ {CoListg A | [¬nil]}
            appendg : CoListg A −→ {CoListg A | [¬nil]} −→ {CoListg A | [¬nil]}
Append on coinductive colists
append : {CoList A | [box]✸[hd]ϕ} −→ CoList A −→ {CoList A | [box]✸[hd]ϕ}
append : {CoList A | [box]✸[nil]} −→ {CoList A | [box]✸[hd]ϕ} −→ {CoList A | [box]✸[hd]ϕ}
append : {CoList A | [box]✸[nil]} −→ {CoList A | [box]✸[nil]} −→ {CoList A | [box]✸[nil]}
Breadth-first tree traversal
                   bftg : {Treeg C | ∀✷[lbl]ϑ} −→ {CoListg C | ✷[hd]ϑ}
(`a la [35] or with Hofmann's algorithm (see e.g. [10]))
A scheduler of resumptions (adapted from [44])
 sched : {Res A | [box]✸[Ret]} −→ {Res A | [box]✸[Ret]} −→ {Res A | [box]✸[Ret]}
 sched : {Res A | [box]✸[now]ψ} −→ {Res A | [box]✸[now]ψ} −→ {Res A | [box]✸[now]ψ}
 sched : {Res A | [box]✷✸[Ret]} −→ {Res A | [box]✷✸[Ret]} −→ {Res A | [box]✷✸[Ret]}
 sched : {Res A | [box]✷✸[out]ϑ} −→ {Res A | [box]✷✸[out]ϑ} −→ {Res A | [box]✷✸[out]ϑ}
(where ✸ is either ∀✸ or ∃✸, ✷ is either ∀✷ or ∃✷, and [out] is either [∧out] or [∨out])
```
**Table 4.** Some Refinement Typings (functions defined in Table 3).

Example 8.2 (The Map Function on Streams). The composite modalities ✷✸ and ✸✷ over streams are read resp. as "infinitely often" and "eventually always". Provided with a function <sup>f</sup> : *<sup>Γ</sup>*-<sup>B</sup> <sup>→</sup> *<sup>Γ</sup>*-<sup>A</sup> taking <sup>b</sup> <sup>∈</sup> *<sup>Γ</sup>*-<sup>B</sup> satisfying <sup>ψ</sup> to <sup>f</sup>(b) <sup>∈</sup> *<sup>Γ</sup>*-<sup>B</sup> satisfying <sup>ϕ</sup>, the function *<sup>Γ</sup>*map on set-theoretic streams returns a stream which infinitely often (resp. eventually always) satisfies ϕ if its stream argument infinitely often (resp. eventually always) satisfies ψ [28, §E.3]. " 

Example 8.3 (The Diagonal Function). Consider a stream of streams s. We have s = (s<sup>i</sup> | i ≥ 0) where each s<sup>i</sup> is itself a stream s<sup>i</sup> = (si,j | j ≥ 0). The diagonal of <sup>s</sup> is then the stream (si,i <sup>|</sup> <sup>i</sup> <sup>≥</sup> 0). Note that <sup>s</sup>i,i <sup>=</sup> hd(tl<sup>i</sup> (hd(tl<sup>i</sup> (s))). Indeed, tli (s) is the stream of streams (s<sup>k</sup> <sup>|</sup> <sup>k</sup> <sup>≥</sup> <sup>i</sup>), so that hd(tl<sup>i</sup> (s)) is the stream s<sup>i</sup> and tli (hd(tl<sup>i</sup> (s))) is the stream (si,k | k ≥ i). Taking its head thus gives si,i. In the diag function of Table 3, the auxiliary higher-order function diagaux<sup>g</sup> iterates the coinductive tl over the head of the stream of streams s. We write ◦ for function composition, so that assuming s : Str<sup>g</sup> (Str A) and t : Str A → Str A, we have (on the coinductive type Str A), (hd<sup>g</sup> s) : Str A and

(hd ◦ <sup>t</sup>) : Str <sup>A</sup> <sup>→</sup> <sup>A</sup> (hd ◦ <sup>t</sup>)(hd<sup>g</sup> <sup>s</sup>) : <sup>A</sup> (<sup>t</sup> ◦ tl) : Str <sup>A</sup> <sup>→</sup> Str <sup>A</sup>

The expected refinement types for diag (Table 4) say that if its argument is a stream whose component streams all satisfy ✷ϕ, then *<sup>Γ</sup>*diag returns a stream whose elements all satisfy <sup>ϕ</sup>. Also, if the argument of *<sup>Γ</sup>*diag is a stream such that eventually all its component streams satisfy ✷ϕ, then it returns a stream which eventually always satisfies ϕ. See [28, §E.4] for details. " 

Example 8.4 (A Fair Stream of Booleans). The non-regular stream (fb 0 1), adapted from [17,8], is of the form ff ·tt·ff ·tt<sup>2</sup> ·ff ··· ff ·tt<sup>m</sup> ·ff ·tt<sup>m</sup>+1 ·ff ···. It thus contains infinitely many tt's and infinitely many ff's. We indeed have (see [28, §E.5] for details) (fb 0 1) : {Str Bool | [box]✷✸[hd][tt] ∧ [box]✷✸[hd][ff]}. " 

Example 8.5 (Resumptions). The type of resumptions Res<sup>g</sup> A (see Ex. 3.2) is adapted from [44]. Its guarded constructors are

$$\begin{array}{l} \mathsf{Rect}^{\mathfrak{g}} := \lambda a. \mathsf{fold}(\mathsf{in}\_{0} \, a) : A \longrightarrow \mathsf{Res}^{\mathfrak{g}} A \\\mathsf{Const}^{\mathfrak{g}} := \lambda k. \, \mathsf{fold}(\mathsf{in}\_{1} \, k) : (\mathsf{I} \to (\mathsf{0} \times \mathsf{Res}^{\mathfrak{g}} A)) \longrightarrow \mathsf{Res}^{\mathfrak{g}} A \end{array}$$

Ret<sup>g</sup> (a) represents a computation which returns the value a : A, while Cont<sup>g</sup> f, k (with f, k : <sup>I</sup> <sup>→</sup> (<sup>O</sup> <sup>×</sup> Res<sup>g</sup> <sup>A</sup>)) represents a computation which on input i : I outputs fi : O and continues with ki : Res<sup>g</sup> A. Given p, q : Res<sup>g</sup> A, the scheduler (sched<sup>g</sup> p q), adapted from [44], first evaluates p. If p returns, then the whole computation returns, with the same value. Otherwise, p evaluates to say Cont<sup>g</sup> f, k. Then (sched<sup>g</sup> p q) produces a computation which on input <sup>i</sup> : <sup>I</sup> outputs fi and continues with (sched<sup>g</sup> q (ki)), thus switching arguments.

Let I be a finite base type (so that Res<sup>g</sup> A is finitary polynomial). Let ψ : A, ϑ : O and ϕ : Res<sup>g</sup> A. We have the following formulae (where i : I):

$$\begin{array}{llll} [\mathsf{Ret}] := [\mathsf{fold}][\mathsf{in}\_{0}] \top & [\mathsf{out}\_{i}] \vartheta := [\mathsf{fold}][\mathsf{in}\_{1}] \left( [\mathsf{i}] \parallel \mapsto [\pi\_{0}] \vartheta \right) \\ [\mathsf{now}] \psi := [\mathsf{fold}][\mathsf{in}\_{0}] \psi & \bigcap\_{\mathsf{i}} \varphi := [\mathsf{fold}][\mathsf{in}\_{1}] \left( [\mathsf{i}] \parallel \mapsto [\pi\_{1}][\mathsf{next}] \varphi \right) \end{array}$$

The formula [Ret] (resp. [now]ψ) holds on a resumption which immediately returns (resp. with a value satisfying <sup>ψ</sup>) and we have Ret<sup>g</sup> : <sup>A</sup> → {Res<sup>g</sup> <sup>A</sup> <sup>|</sup> [Ret]}, Ret<sup>g</sup> : {<sup>A</sup> <sup>|</sup> <sup>ψ</sup>}→{Res<sup>g</sup> <sup>A</sup> <sup>|</sup> [now]ψ}. Moreover, the typings

$$\begin{array}{c} \mathsf{Cont}^{\mathfrak{g}} : \{ \mathsf{I} \to (\mathsf{0} \times \mathsf{P} \mathsf{Res}^{\mathfrak{g}} A) \mid [\mathsf{i}] \,\| \mapsto [\pi\_{0}] \vartheta \} \\ \mathsf{Cont}^{\mathfrak{g}} : \{ \mathsf{I} \to (\mathsf{0} \times \mathsf{P} \mathsf{Res}^{\mathfrak{g}} A) \mid [\mathsf{i}] \,\| \mapsto [\pi\_{1}] [\mathsf{next}] \varphi \} \longrightarrow \{ \mathsf{Res}^{\mathfrak{g}} A \mid \bigcirc\_{i} \varphi \} \end{array}$$

express that [outi]ϑ : Res<sup>g</sup> A is satisfied by Cont<sup>g</sup> f, k if fi satisfies ϑ, and that ;i<sup>ϕ</sup> : Res<sup>g</sup> <sup>A</sup> is satisfied by Cont<sup>g</sup> f, k if ki satisfies [next]ϕ. Since I is a finite base type, it is possible to quantify over its inhabitants. We thus obtain CTL-like variants of ✷ and ✸ (Ex. 4.3.(b) and Ex. 6.3). Namely:


Our system can prove that *<sup>Γ</sup>*sched returns in finite time when so do its arguments, either along some or along any sequence of inputs. We moreover have expected ✷✸ properties for all possible (consistent) combinations of ∃/∀ and [Ret]/[∨out]/[∧out] (Table 4, with ψ : A, ϑ : O safe and smooth) [28, §E.7]. "  Example 8.6 (Breadth-First Traversal). The function bft<sup>g</sup> of Table 3 (where g stands for λx.g x) implements Martin Hofmann's algorithm for breadth-first tree traversal. This algorithm involves the higher-order type Rou<sup>g</sup> A (see Ex. 3.2) with constructors Over<sup>g</sup> := fold(in0) : Rou<sup>g</sup> <sup>A</sup> and

$$\mathsf{Cont}^{\mathsf{g}} := \lambda f. \mathsf{fold}(\mathsf{in}\_1 f) : \left( (\blacktriangleright \mathsf{Rou}^{\mathsf{g}} A \to \blacktriangleright A) \to A \right) \to \mathsf{Rou}^{\mathsf{g}} A \to \mathsf{Set}$$

We refer to [10] for explanations. Consider a formula ϕ : A. We can lift ϕ to

$$[\mathsf{Rou}]\varphi := \nu \alpha. \ [\mathsf{fold}][\mathsf{in}\_1](([\mathsf{next}]\alpha \ || \mapsto [\mathsf{next}]\varphi) \ || \mapsto \varphi) : \mathsf{Rou}^{\mathsf{g}}A$$

We then easily derive the expected refinement type of bft<sup>g</sup> (Table 4, where ϑ : C). Assume that ϑ is safe. On the one hand it is not clear what the meaning of [Rou]ϑ is, because it is an unsafe formula over a non-polynomial type. On the other hand, the type of bft<sup>g</sup> in Tab. 4 has its standard expected meaning (namely: if all nodes of a tree satisfy ϑ then so do all elements of its traversal) because the types Tree<sup>g</sup> <sup>C</sup>, CoList<sup>g</sup> <sup>C</sup> are polynomial and the formulae <sup>∀</sup>✷[lbl]ϑ, ✷[hd]<sup>ϑ</sup> are safe. Hence, our system can prove standard statements via detours through nonstandard ones, which illustrates its compositionality. We have the same typing for a usual breadth-first tree traversal with forests (`a la [35]). See [28, §E.8]. " 

# **9 Related Work**

Type systems based on guarded recursion have been designed to enforce properties of programs handling coinductive types, like causality [45], productivity [5,50,18,6,25,24], or termination [62]. These properties are captured by the type systems, meaning that all well-typed programs satisfy these properties.

In an initially different line of work, temporal logics have been used as type systems for functional reactive programming (FRP), starting from LTL [32,33] to the intuitionistic modal μ-calculus [17]. These works follow the Curry-Howard "proof-as-programs" paradigm, and reflect in the programming languages the constructions of the temporal logic.

The FRP approach has been adapted to guarded recursion, e.g. for the absence of space leaks [44], or the absence of time leaks, with the Fitch-style system of [7]. This more recently lead [8] to consider liveness properties with an FRP approach based on guarded recursion. In this system, the guarded λ-calculus (presented in a Fitch-style type system) is extended with a delay modality (written ;) together with a "until type" A Until B. Following the Curry-Howard correspondence, A Until B is eliminated with a specific recursor, based on the usual unfolding of Until in LTL, and distinct from the guarded fixpoint operator.

In these Curry-Howard approaches, temporal operators are wired into the structure of types. This means that there is no separation between the program and the proof that it satisfies a given temporal property. Different type formers having different program constructs, different temporal specifications for the same program may lead to different actual code.

We have chosen a different approach, based on refinement types, with which the structure of formulae is not reflected in the structure of types. This allows for our examples to be mostly written in a usual guarded recursive fashion (see Table 3). Of course, we indeed use the modality at the type level as a separation between safety and liveness properties. But different liveness properties (e.g. ✸, ✸✷, ✷✸) are uniformly handled with the same -type, which is moreover the expected one in the guarded λ-calculus [18].

Higher-order model checking (HOMC) [54,39] has been introduced to check automatically that higher-order recursion schemes, a simple form of higher-order programs with finite data-types, satisfy a μ-calculus formula. Automatic verification of higher-order programs with infinite data-types (integers) has been explored for safety [40], termination [46], and more generally ω-regular [51] properties. In presence of infinite datatypes, semi-automatic extensions of HOMC have recently been proposed [69]. In contrast with this paper, most HOMC approaches do not consider input-output behaviors on coalgebraic data. A notable exception is [41,23], but it does not handle higher-order functions (such as map), nor polynomial types such as Str(Str A) (Ex. 8.3) or non-positive types such as Rou A (Ex. 8.6) and imposes a strong linearity constraint on pattern matching.

Event-driven approaches consider effects generating streams of events [61], which can be checked for temporal properties with algorithms based on (HO)MC [26,27], or, in presence of infinite datatypes, with refinement type systems [42,53]. Our iteration terms can be seen as oracles, as required by [42] to handle liveness properties, but we do not know if they allow for the non-regular specifications of [53]. While such approaches can handle infinite data types with good levels of automation, they do not have coinductive types nor branching time properties, such as the temporal specification of sched on resumptions (Ex. 8.5)

Along similar lines, branching was approached via non-determinism in [64], which also handles universal and existential properties on traces. This framework can handle CTL-like properties of the form ∃/∀-✷/✸ (with our notation of Ex. 8.5), but not nested combinations of these (as e.g. ∃✷∀✸ for sched in Ex. 8.5). It moreover does not handle coinductive types.

# **10 Conclusion and Future Work**

We have presented a refinement type system for the guarded λ-calculus, with refinements expressing temporal properties stated as (alternation-free) μ-calculus formulae. As we have seen, the system is general enough to prove precise behavioral input/output properties of coinductively-typed programs. Our main contribution is to handle liveness properties in presence of guarded recursive types. As seen in §2, this comes with inherent difficulties. In general, once guarded recursive functions are packed into coinductive ones using , the logical reasoning is made in our system directly on top of programs, following their shape, but requiring no further modification. We thus believe to have achieved some separation between programs and proofs.

We provided several examples. While they demonstrate the flexibility of our system, they also show that more abstraction would be welcomed when proving liveness properties. In addition, our system lacks expressiveness to prove e.g. liveness properties on breadth-first tree traversals.

We believe that our approach could be generalized to other programming languages with inductive or coinductive types. The key requirement are: (1) modalities in the temporal logic to navigate through the types of the languages, (2) a semantics to indicate when a program satisfies a formula of the temporal logic, which is sufficiently closed to the set-theoretic one for liveness properties to get their expected meaning, and (3) inference rules to reason over this realizability semantics.

Extensions of the guarded λ-calculus with dependent types have been explored [14,11,6,24]. It may be possible to extend our work to these systems. This would require to work in a Fitch-style presentation of the modality, as in [7,12], since it is not known how to extend delayed substitutions to dependent types while retaining decidability of type-checking [15]. Also, it is appealing to investigate the generalization of our approach to sized types [1], in which guarded recursive types are representable [67].

We plan to investigate type checking. For instance, in a decidable fragment like the μ-calculus on streams, one can check that a function of type {Str<sup>g</sup> <sup>C</sup> <sup>|</sup> ✸✷[hd]ϑ}→{Str<sup>g</sup> <sup>B</sup> <sup>|</sup> ✸✷[hd]ψ} can be postcomposed with one of type {Str<sup>g</sup> <sup>B</sup> <sup>|</sup> ✷✸[hd]ψ}→{Str<sup>g</sup> <sup>A</sup> <sup>|</sup> ✷✸[hd]ϕ} (since ✸✷[hd]<sup>ψ</sup> <sup>⇒</sup> ✷✸[hd]ψ). Hence, we expect that some automation is possible for fragments of our logic. In presence of iteration terms, arithmetic extensions of the μ-calculus [37,38] may provide interesting backends. An other direction is the interaction with HOMC. If (say) a stream over A is representable in a suitable format, one may use HOMC to check whether it can be argument of a function expecting e.g. a stream of type {Str<sup>g</sup> <sup>A</sup> <sup>|</sup> ✷✸[hd]ϕ}. This might provide automation for fragments of the guarded λ-calculus. Besides, the combination of refinement types with automatic techniques like predicate abstraction [57], abstract interpretation [34], or SMT solvers [66,65] has been particularly successful. More recently, the combination of refinement types inference with HOMC has been investigated [59].

We would like to explore temporal specification of general, effectful programs. To do so, we wish to develop the treatment of the coinductive resumptions monad [55], that provides a general framework to reason on effectful computations, as shown by interaction trees [70]. It would be interesting to study temporal specifications we could give to effectful programs encoded in this setting. To formalize reasoning on such examples, we would like to design an embedding of our system in a proof assistant like Coq.

Following [3], guarded recursion has been used to abstract the reasoning on step-indexing [4] that has been used to design Kripke Logical Relations [2] for typed higher-order effectful programming languages. Program logics for reasoning on such logical relations [19,20] uses this representation of step-indexing via guarded recursion. It is also found in Iris [36], a framework for higher-order concurrent separation logic. It would be interesting to explore the incorporation of temporal reasoning, especially liveness properties, in such logics.

# **References**


Foundations of Software Science and Computation Structures. pp. 20–35. Springer Berlin Heidelberg, Berlin, Heidelberg (2016)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Query Lifting Language-integrated query for heterogeneous nested collections**

Wilmer Ricciotti<sup>1</sup> (-) and James Cheney1,<sup>2</sup>

<sup>1</sup> Laboratory for Foundations of Computer Science University of Edinburgh, Edinburgh, United Kingdom research@wilmer-ricciotti.net jcheney@inf.ed.ac.uk <sup>2</sup> The Alan Turing Institute, London, United Kingdom

**Abstract.** Language-integrated query based on comprehension syntax is a powerful technique for safe database programming, and provides a basis for advanced techniques such as query shredding or query flattening that allow efficient programming with complex nested collections. However, the foundations of these techniques are lacking: although SQL, the most widely-used database query language, supports heterogeneous queries that mix set and multiset semantics, these important capabilities are not supported by known correctness results or implementations that assume homogeneous collections. In this paper we study languageintegrated query for a heterogeneous query language N RCλ(Set, Bag) that combines set and multiset constructs. We show how to normalize and translate queries to SQL, and develop a novel approach to querying heterogeneous nested collections, based on the insight that "local" query subexpressions that calculate nested subcollections can be "lifted" to the top level analogously to lambda-lifting for local function definitions.

**Keywords:** language-integrated query · nested relations · multisets

# **1 Introduction**

Since the rise of relational databases as important software components in the 1980s, it has been widely appreciated that database programming is hard [13]. Databases offer efficient access to flat tabular data using declarative SQL queries, a computational model very different from that of most general-purpose languages. To get the best performance from the database, programmers typically need to formulate important parts of their program's logic as queries, thus effectively programming in two languages: their usual general-purpose language (e.g. Java, Python, Scala) and SQL, with the latter query code typically constructed as unchecked, dynamic strings. Programming in two languages is more than twice as difficult as programming in one language [35]. The result is a hybrid programming model where important parts of the program's functionality are not statically checked and may lead to run-time failures, or worse, vulnerabilities such as SQL injection attacks. This undesirable state of affairs was recognized by Copeland and Maier [13] who coined the term impedance mismatch for it.

Though higher-level wrapper libraries and tools such as object-relational mappings (ORM) can help ameliorate the impedance mismatch, they often come at a price of performance and lack of transparency, as high-level operations on inmemory objects representing database data are not always mapped efficiently to queries [45]. An alternative approach, which has almost as long a history as the impedance mismatch problem itself, is to elevate queries in the host language from unchecked strings to a typed, domain-specific sublanguage, whose interactions with the rest of the program can be checked and which can be mapped to database queries safely while providing strong guarantees. This approach is nowadays typically called language-integrated query following Microsoft's successful LINQ extensions to .NET languages such as C# and F# [36,49]. It is ultimately based on Trinder and Wadler's insight that database queries can be modeled by a form of monadic comprehension syntax [50].

Comprehension-based query languages were placed on strong foundations in the database community in the 1990s [3,4,40,55,33]. A key insight due to Paredaens and van Gucht [40] is that although comprehension-based queries can manipulate nested collections, any expression whose input and output are flat collections (i.e. tables of records without other collections nested inside field values) can always be translated to an equivalent query only using flat relations (i.e. can be expressed in an SQL-like language). Wong [55] subsequently generalized this result and gave a constructive proof, in which the translation from nested to flat queries is accomplished through a strongly normalizing rewriting system.

Wong's work has informed a number of successful implementations, such as the influential Kleisli system [56] for biomedical data integration, and the Links programming language [12]. Although the implementation of LINQ in C# and F# was not directly based on normalization, Cheney et al. [7] showed that normalization can be performed as a pre-processing step to improve both reliability and performance of queries, and guarantee that a well-formed query expression evaluates to (at most) one equivalent SQL expression at run time.

Comprehension-based language-integrated query also forms the basis for libraries such as Quill for Scala [41] and Database-Supported Haskell [21]. Most recently, language-integrated query has been extended further to support efficient execution of queries that construct nested results [25,8,21,53], by translating such queries to a bounded number of flat queries. This technique, currently implemented in Links and DSH, has several benefits: for example to implement provenance-tracking efficiently in queries [17,47]. Fowler et al. [19] showed that in some cases, Links's support for nested query results decreased both the number of queries issued and the total query evaluation time by an order of magnitude or more compared to a Java database application. Unfortunately, there is still a gap between the theory and practice of language-integrated query. Widely-used and practically important SQL features that mix set and multiset collections, such as duplicate elimination, are supported by some implementations, but without guarantees regarding correctness or reliability. So far, such results have only been proved for special cases [7,8], typically for homogeneous queries operating on one uniform collection type. For example, in Links, queries have multiset semantics and cannot use duplicate elimination or set-valued operations. To the best of our knowledge the questions of how to correctly translate flat or nested heterogeneous queries to SQL are open problems.

In this paper, we solve both open problems. We study a heterogeneous query language N RCλ(Set, Bag), which was introduced and studied in our recent work [42]. We have previously extended the key results on query normalization to N RCλ(Set, Bag) [43], but unlike the homogeneous case, the resulting normal forms do not directly correspond to SQL. In this paper, we first show how flat N RCλ(Set, Bag) queries can be translated to SQL, and we then develop a new approach for evaluating queries over nested heterogeneous collections. The key (and, to us at least, surprising) insight is to recognize that these two subproblems are really just different facets of one problem. That is, when translating flat N RCλ(Set, Bag) queries to SQL, the main obstacle is how to deal with query expressions that depend on local variables; when translating nested N RCλ(Set, Bag) queries to equivalent flat ones, the main obstacle is also how to deal with query expressions that depend on local variables. We solve this problem by observing that such query subexpressions can be lifted, analogously to lambda-lifting of local function definitions in functional programming [30], by abstracting over their free variables. Differently to lambda-lifting, however, we lift such expressions by converting them to tabular functions, or graphs, which can be calculated using database query constructs.

The remainder of this paper presents our contributions as follows:


# **2 Overview**

In this section we sketch our approach. We use Links syntax [12], which differs in superficial respects from the core calculus in the rest of the paper but is more readable. We rely without further comment on existing capabilities of languageintegrated query in Links, which are described elsewhere [11,34,8]. Suppose, hypothetically, we are interested in certain presidential candidates and prescription drugs they may be taking<sup>3</sup>. In Links, an expression querying a small database of presidential candidates and their drug prescriptions can be written as follows:

<sup>3</sup> For example, to see whether drug interactions might explain erratic behavior such as rage tweeting, creeping authoritarianism, or creepiness more generally.

**Fig. 1.** Input tables Cand, P res, Drug, intermediate result of Q<sup>F</sup> and result of Q1.

### Q0 = for (c <- Cand, p <- Pres, d <- Drug) where (c.cid == p.cid && p.did == d.did) [(name=c.name,drug=d.drug)]

Some (totally fictitious and not legally actionable) example data is shown in Figure 1; note that the prescriptions table P res is a multiset containing duplicate entries. Executing this query in Links results in the following SQL query:

```
SELECT c.name, d.drug
FROM Cand c, Pres p, Drug d
WHERE c.cid = p.cid AND p.did = d.did
```
In Links, query results from the database are mapped back to list values nondeterministically, and the result of the above query Q<sup>0</sup> will be a list containing two copies of the tuple (DJT, adderall) and one copy of each of the tuples (DJT, hydrochloroquine) and (JRB, caffeine). If we are just interested in which candidates take which drugs and not how many times each drug was taken, we want to remove these duplicates. This can be accomplished in a basic SQL query using the DISTINCT keyword after SELECT. Currently, in Links there is no way to generate queries involving DISTINCT, and this duplicate elimination can only be performed in-memory. While this is not hard to do when the duplicate elimination happens at the end of the query, it is not as clear how to handle deduplication operations correctly in arbitrary places inside queries. Furthermore, SQL has several other operations that can have either set or multiset semantics such as UNION and EXCEPT: how should they be handled?

To study this problem we introduced a core calculus N RCλ(Set, Bag) [42] (reviewed in the next section) in which there are two collection types, sets and multisets (or bags); duplicate elimination maps a multiset to a set with the same elements, and promotion maps a set to the least multiset with the same elements.

We considered, but were not previously able to solve, two problems in the context of N RCλ(Set, Bag) which are addressed in this paper. First, the fundamental results regarding normalization and translation to SQL have been studied only for homogeneous query languages with collections consisting of either sets, bags, or lists. We recently extended the normalization results to N RCλ(Set, Bag) [43], but the resulting normal forms do not correspond directly to SQL queries if operations such as deduplication, promotion, or bag difference are present. Second, query expressions that construct nested collections cannot be translated directly to SQL and can be very expensive to execute in-memory using nested loops, leading to the N + 1 query problem (or query avalanche problem [26]) in which one query is performed for the outer loop and then another N queries are performed, one per iteration of the inner loop. Some techniques have been developed for translating nested queries to a fixed number of flat queries, but to date they either handle only homogeneous set or bag collections [54,8], or lack detailed correctness proofs [26,52].

Regarding the first problem, the closest work in this respect is by Libkin and Wong [33], who studied and related the expressiveness of comprehensionbased homogeneous set and bag query languages but did not consider their heterogeneous combination or translation to SQL. The following query illustrates the fundamental obstacle:

```
Q1 = for (c <- Cand)
     for (d <- dedup(for (p <- Pres, d <- Drug)
                     where (c.cid == p.cid && p.did == d.did)
                     [d.drug]))
     [(name=c.name, drug=d)]
```
This query is similar to Q0, but eliminates duplicates among the drugs for each candidate. The query contains a duplicate elimination operation (dedup) applied to another query subexpression that refers to c, which is introduced in an earlier generator. This is not directly supported in classic SQL: by default the subqueries in FROM clauses cannot refer to tuple variables introduced by earlier parts of the FROM clause. In fact, this query is expressible in SQL:1999 using the LATERAL keyword, which does allow such sideways information-passing:

```
SELECT c.name,d.drug
FROM Cand c, LATERAL (SELECT DISTINCT d.drug
                      FROM Pres p, Drug d
                      WHERE p.cid = c.cid AND p.did = d.did) d
```
(Without the LATERAL keyword, this query is not well-formed SQL.) However, such queries have only recently become widely supported, so are not available on legacy databases, and even when supported, are not typically optimized effectively; for example PostgreSQL will evaluate it as a nested loop, with quadratic complexity or worse.

Regarding the second problem, Van den Bussche [54] showed that any query returning nested set collections can be simulated by n flat queries, where n is the number of occurrences of the set collection type in the result. However, this translation has not been used as the basis for a practical system to our knowledge, and does not respect multiset semantics. Cheney et al. [8] provided an analogous shredding translation for nested multiset queries, but translated to a richer target language (including SQL:1999 features such as ROW NUMBER) and did not handle operations such as multiset difference or duplicate elimination. Thus, neither approach handles the full expressiveness of a heterogeneous query language over bags and sets. The following query illustrates the fundamental obstacle:

```
Q2 = for (x <- Cand)
     [(name=x.name, drugs=dedup(for (p <- Pres, d <- Drug)
                             where (x.cid == p.cid and p.did == d.did)
                             [d.drug]))]
```
Much like Q1, Q<sup>2</sup> builds a multiset of pairs (name, drugs) but here drugs is a set of all of the drugs taken by candidate name. Such a query is, of course, not even syntactically expressible in SQL because it returns a nested collection; it is not expressible in previous work on nested query evaluation either, because the result is a multiset of records, one component of which is a set.

We will now illustrate how to translate Q<sup>1</sup> to a plain SQL query (not using LATERAL), and how to translate Q<sup>2</sup> to two flat queries such that the nested result can be constructed easily from their flat results. First, note that we can rewrite both queries as follows, introducing an abbreviation F(x) for a query subexpression parameterized by x:

```
F(x) = for (p <- Pres, d <- Drug)
       where (x.cid == p.cid and p.did == d.did)
       [d.drug]
Q1 = for (c <- Cand) for (d <- dedup(F(c))) [(name=c.name, drug=d)]
Q2 = for (c <- Cand) [(name=c.name, drugs=dedup(F(c)))]
```
Next, observe that the set of all possible values for x appearing in some call to F(x) is finite, and can even be computed by a query. Therefore, we can write a closed query Q<sup>F</sup> that builds a lookup table that calculates the graph of F (or at least, as much of it as is needed to evaluate the queries) as follows:

Q\_F = dedup(for (x <- Cand, y <- F(x)) [(in=x,out=y))]

Notice that the use of deduplication here is really essential to define Q<sup>F</sup> correctly: if we did not deduplicate then there would be repeated tuples in Q<sup>F</sup> , leading to incorrect results later. If we inline and simplify F(x) in the above query, we get the following:

```
Q_F' = dedup(for (x <- Cand, y <- Pres, z <- Drug)
             where (x.cid == y.cid && y.did = z.did)
             [(in=x,out=z.drug)])
```
Finally we may replace the call to F(x) in Q<sup>1</sup> with a lookup to Q <sup>F</sup> , as follows:

```
Q1' = for (c <- Cand, f <- Q_F') where (c == f.in)
      [(name=c.name, drug=f.out)]
```
This expression may now be translated directly to SQL, because the argument to dedup is now closed:

```
SELECT c.name,f.drug
FROM Cand c, (SELECT DISTINCT x.name,x.cid,z.drug
              FROM Cand x, Pres y, Drug z
              WHERE x.cid = y.cid AND y.did = z.did) f
WHERE c.cid = f.cid AND c.name = f.name
```
**Fig. 2.** Intermediate results of Q21, Q<sup>22</sup> and result of Q2.

Although this query looks a bit more complex than the one given earlier using LATERAL, it can be optimized more effectively, for example PostgreSQL generates a query plan that uses a hash join, giving quasi-linear complexity.

On the other hand, to deal with Q2, we refactor it into two closed, flat queries Q21, Q<sup>22</sup> and an expression Q <sup>2</sup> that builds the nested result from their flat results (illustrated in Figure 2):

```
Q_21 = for (x <- Cand) [(name=x.name, drugs=x)]
Q_22 = Q_F
Q2' = for (x <- Q21)
       [(name=x.name,
         drugs=for (y <- Q_22) where(x.drugs == y.in) [y.out])]
```
Notice that in Q<sup>21</sup> we replaced the call to F with the argument x, while Q<sup>22</sup> is just Q<sup>F</sup> again. The final expression Q <sup>2</sup> builds the nested result (in the host language's memory) by traversing Q<sup>21</sup> and computing the set value of each cs field by looking up the appropriate values from Q22. Thus, the original query result can be computed by first evaluating Q<sup>21</sup> and Q<sup>22</sup> on the database, and then evaluating the final stitching query expression in-memory. (In practice, as discussed in Cheney et al. [8], it is important for performance to use a more sophisticated stitching algorithm than the above naive nested loop, but in this paper we are primarily concerned with the correctness of the transformation.)

The above examples are a bit simplistic, but illustrate the key idea of query lifting. In the rest of this paper we place this approach on a solid foundation, and (partially inspired by Gibbons et al. [20]), to help clarify the reasoning we extend the calculus with a type of tabulated functions or graphs −→<sup>σ</sup> {τ}, with graph abstraction introduction form <sup>G</sup>(−; <sup>−</sup>) and graph application <sup>M</sup> −→<sup>x</sup> . In our running example we could define Q<sup>F</sup> = G(x ← R; F(x)), and we would use the application operation <sup>M</sup> −→<sup>x</sup> to extract the set of elements corresponding to x in Q<sup>F</sup> . We will also consider tabular functions that return multisets rather than sets, in order to deal with queries that return nested multisets.

# **3 Background**

We recap the main points from [42], which introduced a calculus N RCλ(Set, Bag) with the following syntax:

$$\begin{array}{lll}\textbf{Types} & \sigma,\tau ::= b \mid \langle \overbrace{\ell \mathrel{\scalebox{0.5}{\langle\rangle}}}^{\langle\rangle}\rangle \mid \{\sigma\} & \langle\ \overbrace{\ell \mathrel{\rule{0.555}{\langle\rangle}}}^{\langle\rangle}\rangle \mid \sigma \to \tau\\\textbf{Terms} & M,N ::= x \mid t \mid c(M) \mid \langle\!\langle\ell = M\rangle\mid M.\ell\rangle \mid \lambda x.M\mid \mid M.N\\ & & \mid \emptyset \mid \langle\ \amkern-10^{\langle\rangle}\rangle \mid M\cup N \mid \bigcup\{M|\Theta\}\rangle\\ & & \mid \ \amkern-10^{\langle\rangle}\mid M\Downarrow N \mid M-N\mid \langle\!\bigoplus\mathit{M}|\Theta\rangle\\ & & \mid \ \delta M \mid \iota M \mid M\text{ where}\_{\mathsf{set}}\, N\mid M\text{ where}\_{\mathsf{top}}\, N\\ & & \mid \ \mathtt{empty}\_{\mathsf{set}}(M) \mid \mid \mathtt{empty}\_{\mathsf{top}}(M)\\\textbf{Generateors} & \Theta\coloneqq \underline{x}\leftarrow\underline{M}\end{array}$$

x ← M We distinguish between (local) variables x and (global) table names t, and assume standard primitive types b and primitive operations c( −→M) including respectively Booleans **B** and equality at every base type. The syntax for records and record projection −−−−→ <sup>=</sup> <sup>M</sup>, M., and for lambda-abstraction and application λx.M, M N is standard; as usual, let-binding is definable. Set operations include empty set ∅, singleton construction {M}, union M ∪ N, one-armed conditional M **where**set N, emptiness test **empty**set(M), and comprehension -{M | Θ}, where Θ is a sequence of generators x ← M. Similarly, multiset operations include empty bag , singleton <sup>M</sup> , bag union <sup>M</sup> <sup>4</sup> <sup>N</sup>, bag difference <sup>M</sup> <sup>−</sup> <sup>N</sup>, conditional M **where**bag N, emptiness test **empty**bag(M). The syntax is completed by duplicate elimination δM (converting a bag M into a set with the same object type) and promotion ιM (which produces the bag containing all the elements of the set M, with multiplicity 1).

The one-way conditional operations M **where**set N and M **where**bag N evaluate Boolean test N, and return collection M if N is true, otherwise the empty set/bag; two-way conditionals can supported without problems. Other set operations, such as intersection, membership, subset, and equality are also definable, as are bag operations such as intersection [4,33]. Also, we may define **empty**bag(M) as **empty**set(δ(M)) and M **where**set N as δ(ι(M) **where**bag N), but we prefer to include these constructs as primitives for symmetry. Generally, we will allow ourselves to write M **where** N and **empty**(M) without subscripts if the collection kind of these operations is irrelevant or made clear by the context. We freely use syntax for unlabeled tuples −→M, M.i and tuple types −→<sup>σ</sup> and consider them to be syntactic sugar for labeled records.

The typing rules for the calculus are standard and provided in the full version of this paper [44]. For the purposes of this discussion, we will highlight two features of the type system. The first is that the calculus used here differs from our previous work by using constants and table names, whose types are described by a fixed signature Σ:

$$\frac{\Sigma(c) = \begin{array}{c} \overrightarrow{b^\*} \rightarrow b \qquad (\varGamma \vdash M\_i : \sigma\_i)\_{i=1,\ldots,n} \\ \varGamma \vdash c(\overrightarrow{M}) : \tau \end{array} \qquad \qquad \frac{\Sigma(t) = \overleftarrow{\ell : b}}{\varGamma \vdash t : \{\langle \varGamma : b \rangle\}}$$

As usual, a typing judgment Γ M : σ states that a term M is well-typed of type σ, assuming that its free variables have the types declared in the typing context Γ = x<sup>1</sup> : σ1,...,x<sup>k</sup> : σk. For the two rules above, note in particular that the primitive functions c can only take inputs of base type and produce results at base type, and table constants t are always multisets of records where the fields are of base type. We refer to a type of the form −−→ : <sup>b</sup> as flat; if <sup>σ</sup> is flat, we refer to {σ} and <sup>σ</sup> as flat collection types.

The second is that our type system uses an approach `a la Church, meaning that variable abstractions (in lambdas/comprehensions), empty sets and empty bags are annotated with their type in order to ensure the uniqueness of typing.

**Lemma 1.** In N RCλ(Set, Bag), if Γ M : σ and Γ M : τ , then σ = τ .

In the context of a larger language implementation, most of these type annotations can be elided and inferred by type inference. We have chosen to dispense with these details in the main body of this paper to avoid unnecessary syntactic cluttering.

We will use a largely standard denotational semantics for N RCλ(Set, Bag), in which sets and multisets are modeled as finitely-supported functions from their element types to Boolean values {0, 1} or natural numbers respectively. This approach follows the so-called K-relation semantics for queries [23,18] as used for example in the HoTTSQL formalization [10]. The full typing rules and semantics are included in the full version of this paper [44].

N RCλ(Set, Bag) subsumes previous systems including N RC [4,55], BQL [33] and N RC<sup>λ</sup> [11,8]. In this paper, we restrict our attention to queries in which collection types taking part in δ, ι or bag difference contain only flat records. There are various reasons for excluding function types from these operators: for starters, any concrete implementation that used function types in these positions would need to decide the equality of functions; secondly, our rewrite system can ensure that a term whose type does not contain function types has a normal form without lambda abstractions and applications only if any δ, ι, or bag difference used in that term are applied to first-order collections. We thus want to exclude terms such as:

$$\bigoplus \{x \nmid 1\} \nmid 2\} |x \gets \iota(\{\lambda yz. y\} \cup \{\lambda yz. z\})\rangle$$

which do not have an SQL representation despite having a flat collection type.

In order to obtain simpler normal forms, in which comprehensions only reference generators with a flat collection type, we also disallow nested collections within δ, ι, and bag difference. We believe this is without loss of generality because of Libkin and Wong's results showing that allowing such operations at nested types does not add expressiveness to BQL.

We have extended Wong's normalizing rewrite rule system, so as to simplify queries to a form that is close to SQL, with no intermediate nested collections. Since our calculus is more liberal than Wong's, allowing queries to be defined by mixing sets and bags and also using bag difference, we have added non-standard rules to take care of unwanted situations. In particular, we use the following constrained eta-expansions for comprehensions:

$$\bigcup \{ \delta(M - N) | \Theta \} \leadsto \bigcup \{ \{ z \} | \Theta, z \leftarrow \delta(M - N) \} $$

$$\bigleftrightarrow \{ \iota M | \Theta \} \leadsto \bigleftrightarrow \bigleftrightarrow \{ \{ z \} | \Theta, z \leftarrow \iota M \} $$

$$\bigleftrightarrow \{ M - N | \Theta \} \leadsto \bigleftrightarrow \bigleftrightarrow \{ \bigvee z \} | \Theta, z \leftarrow M - N \} $$

**General normal forms** <sup>M</sup> ::= <sup>X</sup> | −−−−→ <sup>=</sup> <sup>M</sup> | <sup>Q</sup> <sup>|</sup> <sup>R</sup> **Base type terms** X ::= x. | c( −→X) <sup>|</sup> **empty**set(Q∗) <sup>|</sup> **empty**bag(R∗) **Set normal forms** Q ::= . −→C <sup>C</sup> ::= .{{M} **where**set <sup>X</sup><sup>|</sup> −−−−→ <sup>x</sup> <sup>←</sup> <sup>F</sup>} F ::= δt | δ(R<sup>∗</sup> <sup>1</sup> − R<sup>∗</sup> 2) **Bag normal forms** R ::= / −→D <sup>D</sup> ::= /<sup>M</sup> **where**bag <sup>X</sup><sup>|</sup> −−−−→ <sup>x</sup> <sup>←</sup> <sup>G</sup> G ::= t | ιQ<sup>∗</sup> | R<sup>∗</sup> <sup>1</sup> − R<sup>∗</sup> 2

**Fig. 3.** Nested relational normal forms.

The rationale of these rules is that in order to achieve, for comprehensions, a form that can be easily translated to an SQL select query, we need to move all the syntactic forms that are blocking to most normalization rules (i.e. promotion and bag difference) from the head of the comprehension to a generator. In order for this strategy to work out, we also need to know that the type of these subexpressions is flat, as we previously mentioned.

In Figure 3 we show the grammar for the normal forms for terms of nested relational types, i.e. types of the following form:

$$\sigma ::= b \mid \langle \overrightarrow{\ell : \sigma} \rangle \mid \{\sigma\} \mid \langle \sigma \rangle$$

For ease of presentation, the grammar actually describes a "standardized" version of the normal forms in which:


$$\bigcup \{ \{ M \} | \Theta \} = \bigcup \{ \{ M \} \text{ where true} \mid \Theta \} $$

**–** singletons that do not appear as the head of a comprehension are represented as trivial comprehensions:

$$\{M\} = \bigcup \{ \{M\} \mid \} $$

Each normal form M can be either a term of base type X, a tuple −−−−→ <sup>=</sup> <sup>M</sup>, a set Q, or a bag R. The normal forms of sets and bags are rather similar, both being defined as unions of comprehensions with a singleton head. The generators for set comprehensions F include deduplicated tables and deduplicated bag differences; the generators for bag comprehensions G must be either tables, promoted set queries, or bag differences.

The non-terminals used as the arguments of emptiness tests, promotion, and bag difference have been marked with a star to emphasize the fact that they

(∅)sql <sup>=</sup> SELECT <sup>42</sup> WHERE 0=1 (-)sql = SELECT 42 WHERE 0=1 (x.)sql = x. (c( −→X))sql = (c)sql( −−−→ (X)sql) ( −−−−→ <sup>=</sup> <sup>X</sup>)sql = (X1)sql AS 1,..., (Xn)sql AS <sup>n</sup> (**empty**set(Q∗))sql = NOT EXISTS (Q∗)sql (**empty**bag(R∗))sql = NOT EXISTS (R∗)sql (Q∗ <sup>1</sup> ∪ Q<sup>∗</sup> 2)sql = (Q<sup>∗</sup> <sup>1</sup>)sql UNION (Q<sup>∗</sup> 2)sql (R<sup>∗</sup> <sup>1</sup> # R<sup>∗</sup> 2)sql = (R<sup>∗</sup> <sup>1</sup>)sql UNION ALL (R<sup>∗</sup> 2)sql (t)sql <sup>=</sup> SELECT <sup>∗</sup> FROM <sup>t</sup> (R<sup>∗</sup> <sup>1</sup> − R<sup>∗</sup> 2)sql = (R<sup>∗</sup> <sup>1</sup>)sql EXCEPT ALL (R<sup>∗</sup> 2)sql (δt)sql <sup>=</sup> SELECT DISTINCT <sup>∗</sup> FROM <sup>t</sup> (ι(Q∗))sql = (Q∗)sql (δ(R∗ <sup>1</sup> − R<sup>∗</sup> <sup>2</sup>))sql <sup>=</sup> SELECT DISTINCT <sup>∗</sup> FROM ((R<sup>∗</sup> <sup>1</sup>)sql EXCEPT ALL (R<sup>∗</sup> <sup>2</sup>)sqls) <sup>r</sup> (<sup>x</sup> <sup>←</sup> <sup>F</sup> )sql <sup>=</sup> ((F )sql) x (x closed) LATERAL ((F )sql) x (otherwise) (<sup>x</sup> <sup>←</sup> <sup>G</sup>)sql <sup>=</sup> ((G)sql) x (x closed) LATERAL ((G)sql) x (otherwise) ( {{M∗} **where**set <sup>X</sup> <sup>|</sup> −−−−→ <sup>x</sup> <sup>←</sup> <sup>F</sup> })sql <sup>=</sup> SELECT DISTINCT (M∗)sql FROM ( −−−−→ <sup>x</sup> <sup>←</sup> <sup>F</sup> )sql WHERE (X)sql ( M∗ **where**bag <sup>X</sup> <sup>|</sup> −−−−→ <sup>x</sup> <sup>←</sup> <sup>G</sup>)sql <sup>=</sup> SELECT (M∗)sql FROM −−−−−−−→ (<sup>x</sup> <sup>←</sup> <sup>G</sup>)sql WHERE (X)sql

**Fig. 4.** Translation to SQL

must have a flat collection type. The corresponding grammar can be obtained from the grammar for nested normal forms by replacing the rule for M with the following:

$$M^\* \coloneqq \langle \overrightarrow{\ell = X} \rangle$$

Normalized queries can be translated to SQL as shown in Figure 4 as long as they have a flat collection type. The translation uses SELECT DISTINCT and UNION where a set semantics is needed, and SELECT, UNION ALL and EXCEPT ALL in the case of bag semantics. Note that promotion expressions ιQ<sup>∗</sup> are translated simply by translating Q∗, because in SQL there is no type distinction between set and multiset queries: all query results are multisets, and sets are considered to be multisets having no duplicates.

The other main complication in this translation is in handling generators x ← F, x ← G where F or G may be a non-closed expression ι(Q∗), R<sup>∗</sup> <sup>1</sup> −R<sup>∗</sup> <sup>2</sup>, or δ(R<sup>∗</sup> <sup>1</sup> − R<sup>∗</sup> <sup>2</sup>) containing references to other locally-bound variables. To deal with the resulting lateral variable references, we add the LATERAL keyword to such queries. As explained earlier, the use of LATERAL can be problematic and we will return to this issue in Section 5.

Remark 1 (Record flattening). The above translations handle queries that take flat tables as input and produce flat results (collections of flat records −−→ : <sup>b</sup>). It is straightforward to support queries that return nested records (i.e. records containing other records, but not collections). For example, a query <sup>M</sup> : b1,b2, b3 can be handled by defining both directions of the obvious isomorphism <sup>N</sup> : b1,b2, b3 <sup>∼</sup><sup>=</sup> b1, b2, b3 : <sup>N</sup> <sup>−</sup><sup>1</sup>, normalizing the flat query <sup>N</sup> ◦ <sup>M</sup>, evaluating the corresponding SQL, and applying the inverse <sup>N</sup> <sup>−</sup><sup>1</sup> to the results. Such record flattening is described in detail by Cheney et al. [9] and is implemented in Links, so we will use it from now on without further discussion.

(Γ, x −−−−−−−→ <sup>i</sup>−<sup>1</sup> : <sup>σ</sup><sup>i</sup>−<sup>1</sup> <sup>L</sup><sup>i</sup> : {σi})<sup>i</sup>=1,...,n Γ, −−→<sup>x</sup> : <sup>σ</sup> <sup>M</sup> : {τ} <sup>Γ</sup> Gset( −−−−→ <sup>x</sup> <sup>←</sup> <sup>L</sup>; <sup>M</sup>) : −→<sup>σ</sup> {τ} (Γ, <sup>−</sup>x−−−−−−→ <sup>i</sup>−<sup>1</sup> : <sup>σ</sup><sup>i</sup>−<sup>1</sup> <sup>L</sup><sup>i</sup> : {σi})<sup>i</sup>=1,...,n Γ, −−→<sup>x</sup> : <sup>σ</sup> <sup>M</sup> : <sup>τ</sup> <sup>Γ</sup> Gbag( −−−−→ <sup>x</sup> <sup>←</sup> <sup>L</sup>; <sup>M</sup>) : −→<sup>σ</sup> <sup>τ</sup> <sup>Γ</sup> <sup>M</sup> : −→<sup>σ</sup> <sup>τ</sup> (<sup>Γ</sup> <sup>N</sup><sup>i</sup> : <sup>σ</sup>i)<sup>i</sup> <sup>Γ</sup> <sup>M</sup> ( −→N ) : τ <sup>Γ</sup> <sup>M</sup> : −→<sup>σ</sup> <sup>τ</sup> <sup>Γ</sup> <sup>N</sup> : −→<sup>σ</sup> <sup>τ</sup> <sup>Γ</sup> <sup>M</sup> <sup>−</sup> <sup>N</sup> : −→<sup>σ</sup> <sup>τ</sup> <sup>Γ</sup> <sup>M</sup> : −→<sup>σ</sup> {τ} <sup>Γ</sup> <sup>N</sup> : −→<sup>σ</sup> {τ} <sup>Γ</sup> <sup>M</sup> <sup>∪</sup> <sup>N</sup> : −→<sup>σ</sup> {τ} <sup>Γ</sup> <sup>M</sup> : −→<sup>σ</sup> <sup>τ</sup> <sup>Γ</sup> <sup>N</sup> : −→<sup>σ</sup> <sup>τ</sup> <sup>Γ</sup> <sup>M</sup> ' <sup>N</sup> : −→<sup>σ</sup> <sup>τ</sup> <sup>Γ</sup> <sup>M</sup> : −→<sup>σ</sup> <sup>τ</sup> <sup>Γ</sup> δM : −→<sup>σ</sup> {τ} <sup>Γ</sup> <sup>M</sup> : −→<sup>σ</sup> {τ} <sup>Γ</sup> ιM : −→<sup>σ</sup> <sup>τ</sup> 

**Fig. 5.** N RC<sup>G</sup> additional typing rules.

# **4 A relational calculus of tabular functions**

We now introduce N RCG, an extension of the calculus N RCλ(Set, Bag) providing a new type of finite tabular function graphs (in the remainder of this paper, also called simply "graphs"; they are similar to the finite maps and tables of Gibbons et al. [20]). The syntax of N RC<sup>G</sup> is defined as follows:

**Types** σ, τ ::= ··· | −→<sup>σ</sup> <sup>τ</sup> **Terms** M,N ::= ··· | Gset(Θ; <sup>N</sup>) | Gbag(Θ; <sup>N</sup>) <sup>|</sup> <sup>M</sup> ( −→N )

Semantically, the type of graphs −→σ τ will be interpreted as the set of finite functions from sequences of values of type −→σ to values in τ : such functions can return non-trivial values only for a finite subset of their input type. In our settings, we will require the output type of graphs to be a collection type (i.e. τ shall be either {τ } or <sup>τ</sup> for some <sup>τ</sup> ), and we will use <sup>∅</sup> or as the trivial value. The typing rules involving graphs are shown in Figure 5.

Graphs are created using the graph abstraction operations <sup>G</sup>set(Θ; <sup>N</sup>) and <sup>G</sup>bag(Θ; <sup>N</sup>), where <sup>Θ</sup> is a sequence of generators in the form −−−−→ x ← M; the dual operation of graph application is denoted by <sup>M</sup> ( −→N ). An expression of the form <sup>G</sup>set( −−−−→ x ← M; N) is used to construct a (finite) tabular function mapping each sequence of values R1,...,R<sup>n</sup> in the sets M1,...,M<sup>n</sup> to the set N B−→R/−→<sup>x</sup> C . If each <sup>M</sup><sup>i</sup> has type {σi} and <sup>N</sup> has type {τ}, then the graph has type −→<sup>σ</sup> {τ}. Similarly, if <sup>N</sup> has type <sup>τ</sup> , <sup>G</sup>bag( −−−−→ <sup>x</sup> <sup>←</sup> <sup>M</sup>; <sup>N</sup>) has type −→<sup>σ</sup> <sup>τ</sup> . The terms M1,...,M<sup>n</sup> constitute the (finite) domain of this graph. When the kind of graph application (set-based or bag-based) is clear from the context or unimportant, we will allow ourselves to write <sup>G</sup>(−; <sup>−</sup>) instead of <sup>G</sup>set(−; <sup>−</sup>) or <sup>G</sup>bag(−; <sup>−</sup>).

A graph G of type −→σ τ can be applied to a sequence of terms N1,...,N<sup>n</sup> of type σ1,...,σ<sup>n</sup> to obtain a term of type τ . If G = G( −−−−→ x ← L; M), then we will want the semantics of G( −−−−→ <sup>x</sup> <sup>←</sup> <sup>L</sup>; <sup>M</sup>) ( −→N ) to be the same as that of M B−→N/−→<sup>x</sup> C , provided that each of the N<sup>i</sup> is in the corresponding element of the domain of the graph. The typing rule does not enforce this requirement and if any of the N<sup>i</sup> is not an element of Li, the graph application will evaluate to an empty set or bag (depending on τ ).

Graphs can also be merged by union, using ∪ or 4 depending on their output collection kind. Furthermore, graphs that return bags can be subtracted from one another using bag difference; the deduplication and promotion operations also extend to graphs in the obvious way.

**Lemma 2.** In N RCG, Γ M : σ and Γ M : τ , then σ = τ .

Whenever M is well typed and its typing environment is made clear by the context, we will allow ourselves to write ty(M) for the type of M. Furthermore, given a sequence of generators Θ = x<sup>1</sup> ← L1,...x<sup>n</sup> ← Ln, such that for i = 1,...,n we have x<sup>1</sup> : σ1,...,x<sup>i</sup>−<sup>1</sup> : σ<sup>i</sup>−<sup>1</sup> L<sup>i</sup> : σi, we will write ty(Θ) to denote the associated typing context:

$$ty(\Theta) := x\_1 : \sigma\_1, \dots, x\_n : \sigma\_n$$

# **4.1 Semantics and translation to** *NRCλ***(***Set, Bag***)**

The semantics of N RCλ(Set, Bag) is extended to N RC<sup>G</sup> as follows:

$$\begin{array}{lcl} \left[\mathcal{G}^{\mathsf{set}}(\overrightarrow{x\mapsto\leftarrow L};M)\right] & \rho(\overrightarrow{u}^{\flat},v) \\ = (\bigwedge\_{i} [L\_{i}] \,\rho[x\_{1}\mapsto u\_{1},\ldots,x\_{i-1}\mapsto u\_{i-1}]u\_{i}) \wedge [M] \,\rho[\overrightarrow{x\mapsto\leftarrow d}]v \\ = \left[\mathcal{G}^{\mathsf{bag}}(\overrightarrow{x\mapsto L};M)\right] \,\rho(\overrightarrow{u},v) \\ = (\bigwedge\_{i} [L\_{i}] \,\rho[x\_{1}\mapsto u\_{1},\ldots,\widehat{x\_{i-1}}\mapsto u\_{i-1}]u\_{i}) \times [M] \,\rho[\overrightarrow{x\mapsto d}]v \\ = \left[\![M\circledast]\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!$$

In this definition, graph abstractions are interpreted as collections of pairs of values (−→u,v) where the −→u represent the input and v the corresponding output of the graph; consequently, the semantics of a graph <sup>G</sup>set( −−−−→ x ← L; M) states that the multiplicity of (−→u,v) is equal to the multiplicity of v in the semantics of M (where each x<sup>i</sup> is mapped to ui) if each u<sup>i</sup> is in the semantics of Li, and zero otherwise. The semantics of bag graph abstractions is similar, with × substituted for ∧ to allow multiplicities greater than one in the graph output.

For graph applications <sup>M</sup> ( −→N ), the multiplicity of v is obtained as the multiplicity of (−−−→ -<sup>N</sup> ρ, v) in the semantics of <sup>M</sup>. The semantics of set and bag union, bag difference, bag deduplication, and set promotion, as defined in N RCλ(Set, Bag), are extended to graphs and remain otherwise unchanged in N RCG.

In fact (as noted for example by Gibbons et al. [20]), the graph constructs of N RC<sup>G</sup> are just a notational convenience: we can translate N RC<sup>G</sup> back to N RCλ(Set, Bag) by translating types −→<sup>σ</sup> {τ} and −→<sup>σ</sup> <sup>τ</sup> to {−→σ,τ } and −→σ,τ respectively, and the term constructs are rewritten as follows:

$$\begin{split} \mathcal{G}^{\mathsf{spt}}(\overrightarrow{x \leftarrow L}; M) &\rightharpoonup \bigcup \{ \{ \langle \overrightarrow{x}^{l}, y \rangle \} \mid \overrightarrow{x \leftarrow L}, y \leftarrow M \} \\ \mathcal{G}^{\mathsf{bng}}(\overrightarrow{x \leftarrow L}; M) &\rightharpoonup \bigcup \{ \langle \overrightarrow{x^{l}}, y \rangle \} \mid \overrightarrow{x \leftarrow \iota(L)}, y \leftarrow M \} \\ &M \circledast \langle \overrightarrow{N} \rangle \leadsto \bigcup \{ \langle y \rangle \text{ where} \overrightarrow{x^{l}} = \overrightarrow{N} \mid \langle \overrightarrow{x^{l}}, y \rangle \leftarrow M \} \quad (M: \overrightarrow{\sigma} \twoheadrightarrow \{ \tau \}) \\ &M \circledast \langle \overrightarrow{N} \rangle \leadsto \bigcup \{ \langle y \rangle \text{ where} \overrightarrow{\sigma} = \overrightarrow{N} \mid \langle \overrightarrow{x^{l}}, y \rangle \leftarrow M \} \quad (M: \overrightarrow{\sigma} \twoheadrightarrow \{ \tau \}) \end{split}$$

# **5 Delateralization**

As explained at the end of section 3, if a subexpression of the form ι(N) or N<sup>1</sup> − N<sup>2</sup> contains free variables introduced by other generators in the query (i.e. not globally-scoped table variables), such queries cannot be translated directly to SQL, unless the SQL:1999 LATERAL keyword is used.

More precisely, we can give the following definition of lateral variable occurrence.

**Definition 1.** Given a query containing a comprehension -{M | Θ, x ← N,Θ } or '<sup>M</sup> <sup>|</sup> Θ, x <sup>←</sup> N,Θ as a subterm, we say that <sup>x</sup> occurs laterally in <sup>Θ</sup> if, and only if, there is a binding y ← N in Θ such that x ∈ FV(N ).

Since LATERAL is not implemented on all databases, and is sometimes implemented inefficiently, we would still like to avoid it. In this section we show how lateral occurrences can be eliminated even in the presence of bag promotion and bag difference, by means of a process we call delateralization.

Using the N RC<sup>G</sup> constructs, we can delateralize simple cases of deduplication or multiset difference as follows:

/<sup>M</sup> <sup>|</sup> <sup>x</sup> <sup>←</sup> N, y <sup>←</sup> <sup>ι</sup>(P) /<sup>M</sup> <sup>|</sup> <sup>x</sup> <sup>←</sup> N, y <sup>←</sup> <sup>ι</sup>(G(<sup>x</sup> <sup>←</sup> δN; <sup>P</sup>)) <sup>x</sup> /<sup>M</sup> / <sup>|</sup> <sup>x</sup> <sup>←</sup> N, y <sup>←</sup> <sup>P</sup><sup>1</sup> <sup>−</sup> <sup>P</sup><sup>2</sup> <sup>M</sup> <sup>|</sup> <sup>x</sup> <sup>←</sup> N, y <sup>←</sup> (G(<sup>x</sup> <sup>←</sup> δN; <sup>P</sup>1) − G(<sup>x</sup> <sup>←</sup> δN; <sup>P</sup>2)) <sup>x</sup> .{<sup>M</sup> . <sup>|</sup> <sup>x</sup> <sup>←</sup> N, y <sup>←</sup> <sup>δ</sup>(P<sup>1</sup> <sup>−</sup> <sup>P</sup>2)} {<sup>M</sup> <sup>|</sup> <sup>x</sup> <sup>←</sup> N, y <sup>←</sup> <sup>δ</sup>(G(<sup>x</sup> <sup>←</sup> <sup>N</sup>; <sup>P</sup>1) − G(<sup>x</sup> <sup>←</sup> <sup>N</sup>; <sup>P</sup>2)) x}

It is necessary to deduplicate N in the first two rules to ensure that the results correctly represent finite maps from the distinct elements of N to multisets of corresponding elements of P. (In any case, N needs to be deduplicated in order to be used as a set in G(x ← δN; )).

Given a query expression in normal form, the above rules together with standard equivalences (such as commutativity of independent generators) can be used to delateralize it: that is, remove all occurrences of free variables in subexpressions of the form ι(N), M<sup>1</sup> − M2, or δ(M<sup>1</sup> − M2).

**Theorem 1.** If M is a flat query in normal form, then there exists M equivalent to M with no lateral variable occurrences.

The proof of correctness of the basic delateralization rules and the above correctness theorem are in the full version of this paper [44].

To illustrate some subtleties of the translation, here is a trickier example:

$$\bigoplus \mathbb{X}^{\downarrow} \mid x \leftarrow N, y \leftarrow Q - \iota(P) \widehat{\phantom{\rule{0.5ex}{0.5ex}}} \mid \} $$

where Q, P both depend on x. We proceed from the outside in, first delateralizing the difference:

$$\bigoplus \mathbb{Q}M \mid x \leftarrow N, y \leftarrow (\mathcal{G}(x \leftarrow \delta(N); Q) - \mathcal{G}(x \leftarrow \delta(N); \iota(P))) \otimes x \{\}$$

Note that this still contains a lateral subquery, namely ι(P) depends on x. After translating back to N RCλ(Set, Bag), and delateralizing ι(P), the query normalizes to:

$$\begin{cases} Q\_1 = \bigcup \{ (x, z) \mid x \in \delta(N), z \gets P \} \\ Q\_2 = (\mathsf{[\downarrow\emptyset](x, z)} \mid x \in \iota \delta(N), z \gets Q \} - (\mathsf{[\downarrow\emptyset](x, z)} \mid x \in \iota \delta(N), (x', z) \gets \iota (Q\_1), x = x' \} \\ \mathsf{[\downarrow\emptyset]M \mid x \gets N, (x', y) \gets Q\_2, x = x' \} \end{cases}$$

# **6 Query lifting and shredding**

In the previous sections, we have discussed how to translate queries with flat collection input and output to SQL. The shredding technique, introduced in [8], can be used to convert queries with nested output (but flat input) to multiple flat queries that can be independently evaluated on an SQL database, then stitched together to obtain the required nested result. This section provides an improved version of shredding, extended to a more liberal setting mixing sets and bags and allowing bag difference operations, and described using the graph operations we have introduced, allowing an easier understanding of the shredding process.

We introduce, in Figure 6, a shredding judgment to denote the process by which, given a normalized N RCλ(Set, Bag) query, each of its subqueries having a nested collection type is lifted (in a manner analogous to lambda-lifting [30]) to an independent graph query: more specifically, shredding will produce a shredding environment (denoted by Φ, Ψ, . . .), which is a finite map associating special graph variables ϕ, ψ to N RC<sup>G</sup> terms:

$$\Phi, \Psi, \dots \dots \coloneqq [\overrightarrow{\varphi \mapsto M}]$$

The shredding judgment has the following form:

$$
\Phi; \Theta \vdash M \not\Rightarrow \check{M} \mid \Psi
$$

where the <sup>⇒</sup> symbol separates the input (to the left) from the output (to the right). The normalized N RCλ(Set, Bag) term M is the query that is being considered for shredding; M may contain free variables declared in Θ, which must be a sequence of N RCλ(Set, Bag) set comprehension bindings. Θ is initially empty,

X is a base term Φ; Θ \$ X ⇒ X | Φ (Φi−1; <sup>Θ</sup> \$ <sup>M</sup><sup>i</sup> <sup>⇒</sup> <sup>M</sup>˘<sup>i</sup> <sup>|</sup> <sup>Φ</sup>i)i=1,...,n <sup>Φ</sup>0; <sup>Θ</sup> \$ −−−−→ <sup>=</sup> <sup>M</sup> ⇒ −−−−→ <sup>=</sup> <sup>M</sup>˘ | <sup>Φ</sup><sup>n</sup> ϕ /∈ dom(Φn) (Φi−1; Θ \$ C<sup>i</sup> ⇒ ψ<sup>i</sup> dom(Θ) | Φi)i=1,...,n <sup>Φ</sup>0; <sup>Θ</sup> \$ −→<sup>C</sup> <sup>⇒</sup> <sup>ϕ</sup> dom(Θ) <sup>|</sup> (Φ<sup>n</sup> \ −→<sup>ψ</sup> )[<sup>ϕ</sup> "→ −−−−→ <sup>Φ</sup>n(ψ)] ϕ /∈ dom(Φn) (Φi−1; Θ \$ D<sup>i</sup> ⇒ ψ<sup>i</sup> dom(Θ) | Φi)i=1,...,n <sup>Φ</sup>0; <sup>Θ</sup> \$ −→<sup>D</sup> <sup>⇒</sup> <sup>ϕ</sup> dom(Θ) <sup>|</sup> (Φ<sup>n</sup> \ −→<sup>ψ</sup> )[<sup>ϕ</sup> "→ −−−−→ <sup>Φ</sup>n(ψ)] ϕ /∈ dom(Ψ) <sup>Φ</sup>; Θ, −−−−→ <sup>x</sup> <sup>←</sup> <sup>F</sup> \$ <sup>M</sup> <sup>⇒</sup> <sup>M</sup>˘ <sup>|</sup> <sup>Ψ</sup> <sup>Φ</sup>; <sup>Θ</sup> \$ {{M} **where** <sup>X</sup><sup>|</sup> −−−−→ <sup>x</sup> <sup>←</sup> <sup>F</sup> } <sup>⇒</sup> <sup>ϕ</sup> dom(Θ) | Ψ[ϕ "→ G(Θ; {{M˘ } **where** <sup>X</sup><sup>|</sup> −−−−→ <sup>x</sup> <sup>←</sup> <sup>F</sup> })] ϕ /∈ dom(Ψ) <sup>Φ</sup>; Θ, −−−−−→ <sup>x</sup> <sup>←</sup> <sup>G</sup><sup>δ</sup> \$ <sup>M</sup> <sup>⇒</sup> <sup>M</sup>˘ <sup>|</sup> <sup>Ψ</sup> <sup>Φ</sup>0; <sup>Θ</sup> \$ M **where** <sup>X</sup><sup>|</sup> −−−−→ <sup>x</sup> <sup>←</sup> <sup>G</sup> <sup>⇒</sup> <sup>ϕ</sup> dom(Θ) | Ψ[ϕ "→ G(Θ; M˘ **where** <sup>X</sup><sup>|</sup> −−−−→ <sup>x</sup> <sup>←</sup> <sup>G</sup>)] G<sup>δ</sup> Q<sup>∗</sup> if G = ιQ<sup>∗</sup> δG otherwise <sup>Φ</sup> \ −→<sup>ψ</sup> [(<sup>ϕ</sup> "→ <sup>N</sup>) <sup>∈</sup> <sup>Φ</sup> <sup>|</sup> ϕ /<sup>∈</sup> −→<sup>ψ</sup> ]

**Fig. 6.** Shredding rules.

but during shredding it is extended with parts of the input that have already been processed. Similarly, the input shredding environment Φ is initially empty, but will grow during shredding to collect shredded queries that have already been generated. It is crucial, for our algorithm to work, that M be in the form previously described in Figure 3, as this allows us to make assumptions on its shape: in describing the judgment rules, we will use the same metavariables as are used in that grammar.

The output of shredding consists of a shredded term M˘ and an output shredding environment Ψ. Ψ extends Φ with the new queries obtained by shredding <sup>M</sup>; <sup>M</sup>˘ is an output N RC<sup>G</sup> query obtained from <sup>M</sup> by lifting its collection typed subqueries to independent queries defined in Ψ.

The rules for the shredding judgment operate as follows: the first rule expresses the fact that a normalized base term X does not contain subexpressions with nested collection type, therefore it can be shredded to itself, leaving the shredding environment Φ unchanged; in the case of tuples, we perform shredding pointwise on each field, connecting the input and output shredding environments in a pipeline, and finally combining together the shredded subterms in the obvious way.

The shredding of collection terms (i.e. unions and comprehensions) is performed by means of query lifting: we turn the collection into a globally defined (graph) query, which will be associated to a fresh name ϕ and instantiated to the local comprehension context by graph application. This operation is reminiscent

$$\begin{array}{ccc}\hline\hline\vdash\cline{1-\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\hbox{\$\cdot{\hbox{\$\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot{\cdot$$

**Fig. 7.** Typing rules for shredding environments.

of the lambda lifting and closure conversion techniques used in the implementation of functional languages to convert local function definitions into global ones. Thus, when shredding a collection, besides processing its subterms recursively, we will need to extend the output shredding environment with a definition for the new global graph ϕ. In the interesting case of comprehensions, ϕ is defined by graph-abstracting over the comprehension context Θ; notice that, since we are only shredding normalized terms, we know that they have a certain shape and, in particular, the judgment for bag comprehensions must ensure that generators −→G be converted into sets.

The shredding of set and bag unions is performed by recursion on the subterms, using the same plumbing technique we employed for tuples; additionally, we optimize the output shredding environment by removing the graph queries −→ψ resulting from recursion, since they are absorbed into the new graph ϕ.

Notice that since the comprehension generators of our normalized queries must have a flat collection type, they do not need to be processed recursively. Furthermore, since our normal forms ensure that promotion and bag difference terms can only appear as comprehension generators, we do not need to provide rules for these cases.

The shredding environments used by the shredding judgment must be well typed, in the sense described by the rules of Figure 7: the judgment Φ : Γ means that the graph variables of Φ are mapped to terms whose type is described by Γ. Whenever we add a mapping [ϕ → M] to Φ, we must make sure that M is well typed (of graph type) in the typing environment Γ associated to Φ.

If Φ : Γ, we will write ty(Φ) to refer to the typing environment Γ associated to Φ. The following result states that shredding preserves well-typedness:

**Theorem 2.** Let <sup>Θ</sup> be well-typed and ty(Θ) <sup>M</sup> : <sup>σ</sup>. If <sup>Θ</sup> <sup>M</sup> <sup>⇒</sup> <sup>M</sup>˘ <sup>|</sup> <sup>Φ</sup>, then:

**–** Φ is well-typed **–** ty(Φ), ty(Θ) <sup>M</sup>˘ : <sup>σ</sup>

We now intend to prove the correctness of shredding: first, we state a lemma which we can use to simplify certain expressions involving the semantics of graph application:

**Definition 2.** Let Θ be a closed, well-typed sequence of generators. A substitution <sup>ρ</sup> is a model of <sup>Θ</sup> (notation: <sup>ρ</sup> <sup>Θ</sup>) if, and only if, for all <sup>x</sup> <sup>∈</sup> dom(Θ), we have -<sup>Θ</sup>(x)) <sup>ρ</sup>(x) <sup>&</sup>gt; <sup>0</sup>.

$$\text{Lemma 3.}\quad I.\quad \left[ (\bigcup \overrightarrow{G}) \circledast (\overrightarrow{N}) \right] \rho = \bigvee\_i \left[ G\_i \circledast (\overrightarrow{N}) \right] \rho$$

2. If <sup>ρ</sup> <sup>Θ</sup>, then for all <sup>M</sup> we have -<sup>G</sup>(Θ; <sup>M</sup>) (dom(Θ)) <sup>ρ</sup> <sup>=</sup> -<sup>M</sup> <sup>ρ</sup>.

To state the correctness of shredding, we need the following notion of shredding environment substitution.

**Definition 3.** For every well-typed shredding environment Φ, the substitution of Φ into an N RC<sup>G</sup> term M (notation: MΦ) is defined as the operation replacing within M every free variable ϕ ∈ dom(Φ) with (Φ(ϕ))Φ (i.e.: the value assigned by Φ to ϕ, after recursively substituting Φ).

We can easily show that the above definition is well posed for well-typed Φ.

We now show that shredding preserves the semantics of the input term, in the sense that the term obtained by substituting the output shredding environment into the output term is equivalent to the input.

**Theorem 3 (Correctness of shredding).** Let Θ be well-typed and ty(Θ) <sup>M</sup> : <sup>σ</sup>. If <sup>Φ</sup>; <sup>Θ</sup> <sup>M</sup> <sup>⇒</sup> <sup>M</sup>˘ <sup>|</sup> <sup>Ψ</sup>, then, for all <sup>ρ</sup> <sup>Θ</sup>, we have -<sup>M</sup> <sup>ρ</sup> <sup>=</sup> MΨ˘ ρ.

Proof. By induction on the shredding judgment. We comment two representative cases:

**–** in the set comprehension case, we want to prove

$$\begin{aligned} & \left[ \bigcup \{ \{ M \} \text{ where } X \overline{[x \gets F]} \} \right] \rho \ v = \\ & \left[ \left( \varphi \circledast (\text{dom}(\Theta)) \right) \Psi \big[ \varphi \mapsto \bigcup \{ \mathcal{G}(\Theta; \bigcup \{ \check{M} \} \text{ where } X \overline{[x \gets F]} \} \big) \right] \rho \ v \end{aligned}$$

where ρ Θ. We rewrite the lhs as follows:

$$\begin{array}{l} \left[ \bigcup \{ \{ M \} \text{ where } X | \overline{x \leftarrow F} \} \right] \rho \ v \\ = \bigvee\_{\overline{u}} (\left[ \!\!\!\!M \right] \rho\_{n} = v) \wedge (\left[ \!\!\!X \right] \rho\_{n}) \wedge (\left[ \!\!\!F\_{i} \right] \rho\_{i-1} \ u\_{i}))\_{i=1,\ldots,n} \end{array}$$

where <sup>ρ</sup><sup>i</sup> <sup>=</sup> <sup>ρ</sup>[x<sup>1</sup> → <sup>u</sup>1,...,x<sup>i</sup> → <sup>u</sup>i] Θ, x<sup>1</sup> <sup>←</sup> <sup>F</sup>1,...,x<sup>i</sup> <sup>←</sup> <sup>F</sup><sup>i</sup> for all <sup>i</sup> = 1,...,n, and <sup>u</sup><sup>i</sup> s.t. -<sup>F</sup><sup>i</sup> <sup>ρ</sup><sup>i</sup>−<sup>1</sup>ui. By the definition of substitution and by Lemma 3, we rewrite the rhs:

$$\begin{array}{lcl} & \left[ (\varphi \circledast (\text{dom}(\Theta))) \Psi[\varphi \mapsto \mathcal{G}(\Theta; \bigcup \{ \check{M} \} \text{ where } X | \overline{x \leftarrow \stackrel{\cdot}{F}})] \right] \rho \ v \\ &= \left[ (\mathcal{G}(\Theta; \bigcup \{ \check{M} \Psi \} \text{ where } X | \overline{x \leftarrow \stackrel{\cdot}{F}})) \otimes (\text{dom}(\Theta)) \right] \rho \ v \\ &= \left[ \bigcup \{ \check{M} \Psi \} \text{ where } X | \overline{x \leftarrow \stackrel{\cdot}{F}} \} \right] \rho \ v \\ &= \bigvee\_{\stackrel{\cdot}{\neg I}} (\left[ \check{M} \Psi \right] \rho\_{n} = v) \wedge (\left[ \boldsymbol{F}\_{i} \right] \rho\_{i-1} \ u\_{i}))\_{i=1,\ldots,n} \wedge (\left[ \boldsymbol{X} \right] \rho') \end{array}$$

We can prove that for all −→u such that ρ<sup>n</sup> - Θ, −−−−→ <sup>x</sup> <sup>←</sup> <sup>F</sup>, (-<sup>F</sup><sup>i</sup> <sup>ρ</sup><sup>i</sup>−<sup>1</sup> <sup>u</sup>i)<sup>i</sup>=1,...,n <sup>=</sup> 0. Therefore, we only need to consider those −→u such that ρ<sup>n</sup> Θ, −−−−→ x ← F. Then, to prove the thesis, we only need to show:

$$\left[M\right]\rho\_n = \left[\check{M}\Phi\right]\rho\_n$$

which follows by induction hypothesis, for ρ<sup>n</sup> Θ, −−−−→ x ← F. **–** in the set union case, we want to prove

$$\left[\bigcup \overrightarrow{C}\right] \rho \ v = \left[ (\varphi \circledast (\text{dom}(\Theta))) (\Psi \upharpoonright \overrightarrow{\psi}) [\varphi \mapsto \bigcup \overrightarrow{\Psi(\psi)}] \right] \rho \ v$$

where ρ Θ. We rewrite the lhs as follows:

$$\left[\bigcup \overrightarrow{C}\right] \rho \; v = \bigvee\_{i} \left[C\_{i}\right] \rho \; v$$

By the definition of substitution and by Lemma 3, we rewrite the rhs:

$$\begin{array}{l} \left[ (\varphi \circledast (\text{dom}(\Theta))) (\Psi \nurlyeq \overrightarrow{\psi\choose \psi \choose \Psi(\psi)}) \right] \rho \ v \end{array} $$
 
$$\begin{array}{l} = \left[ (\bigcup \overrightarrow{\Psi(\psi)}) \vec{\Psi} \right] \circledast (\text{dom}(\Theta)) \Big] \rho \ v \\ = \bigvee\_{i} \left[ (\Psi(\psi\_{i})) \Psi \circledast (\text{dom}(\Theta)) \right] \rho \ v \end{array} $$

By induction hypothesis and unfolding of definitions, we know for all i:

$$\left\|[C\_i]\right\|\rho = \left\|(\psi\_i \circledast (\overrightarrow{\text{dom}(\Theta)}))\Psi\right\|\rho = \left\|(\Psi(\psi\_i))\Psi \circledast (\overrightarrow{\text{dom}(\Theta)})\right\|\rho$$

which proves the thesis. " 

# **6.1 Reflecting shredded queries into** *NRCλ***(***Set, Bag***)**

The output of the shredding judgment is a stratified version of the input term, where each element of the output shredding environment provides a layer of collection nesting; furthermore, the output is ordered so that each element of the shredding environment only references graph variables defined to its left, which is convenient for evaluation. Our goal is to evaluate each shredded item as an independent query: however, these items are not immediately convertible to flat queries, partly because their type is still nested, and also due to the presence of graph operations introduced during shredding. We thus need to provide a translation operation capable of converting the output of shredding into independent flat terms of N RCλ(Set, Bag). This translation uses two main ingredients:


The resulting translation, denoted by '·(, is shown in in Figure 8. Let us remark that the translation need be defined only for term forms that can be produced as the output of shredding: this allows us, for instance, not to consider terms such as ιM or M − N, which can only appear as part of flat generators of comprehensions or graphs.

We discuss briefly the interesting cases of the definition of the flattening translation. Base expressions X are expressible in N RCλ(Set, Bag), therefore they can be mapped to themselves (this is also true for **empty**(M), since normalization ensures that the type of M be a flat collection). Graph applications

$$\begin{aligned} \left[X\right] &= X \quad \quad \quad \quad \left[\left(\overleftarrow{\ell \, \dot{M}}\right)\right] = \left\langle\overleftarrow{\ell \, \dot{M}}\right\rangle \\ \left[\bigcup \overrightarrow{C}\right] &= \bigcup \left[\overrightarrow{C}\right] \quad \quad \quad \quad \quad \left[\bigsqcup \overrightarrow{\mathcal{D}}\right] = \left[\spadesuit\right] \left[\overrightarrow{D}\right] \\ \left[\varphi \otimes \left(\overrightarrow{x'}\right)\right] &= \operatorname{index}\{\varphi, \overrightarrow{x'}\} \\ \\ \left[\bigcup \{M\}\text{ where } X | \overline{x \leftarrow \mathsf{F}}\}\right] &= \bigcup \{\{\,M\}\text{) where } X | \overline{x \leftarrow \mathsf{F}}\} \\ \left[\bigoplus \{\,M\}\text{ is } \mathtt{where } X | \overline{x \leftarrow \mathsf{G}}\}\right] &= \left[\#\{\!\langle\,M\rangle\!\}\text{ where } X | \overline{x \leftarrow \mathsf{G}}\}\right] \\ \left[\mathcal{G}^{\mathsf{int}}(\overleftarrow{x \leftarrow \mathsf{F}}; M)\right] &= \bigcup \{\langle x, y \rangle | \overline{x \leftarrow \mathsf{F}}, y \leftarrow \langle M\rangle\!\rangle\} \\ \left[\mathcal{G}^{\mathsf{sing}}(\overleftarrow{x \leftarrow \mathsf{F}}; M)\right] &= \bigcup \{\mathcal{C}(x, y) | \overline{x \leftarrow \mathsf{F}}, y \leftarrow \langle M\rangle\!\}\text{)} \end{aligned}$$

**Fig. 8.** Flattening embedding of shredded queries into N RCλ(Set, Bag).

<sup>ϕ</sup>( −→x ), as we said, are translated with the help of an index abstract operation: this is where the primary purpose of the translation is accomplished, by flattening a collection type to the flat type I, making it possible for a shredded query to be converted to SQL; although we do not specify the concrete implementation of index , it is worth noting that it must store the arguments of the graph application along with the (quoted) name of the graph variable ϕ. Tuples, unions, and comprehensions only require a recursive translation of their subterms: however the generators of comprehensions must have a flat collection type, so no recursion is needed there. Finally, we translate graphs as collections of the pairs obtained by associating elements of the domain of the graph to the corresponding output; it is simple to come up with a comprehension term building such a collection: set-valued graphs are translated using set comprehension, while bag-valued ones use bag comprehension (this also means that in the latter case the generators for the domain of the graph, which are set-typed, must be wrapped in a ι).

We can prove that the flattening embedding produces flat-typed terms, as expected.

**Definition 4.** A well-typed set comprehension generator Θ is flat-typed if, and only if, for all x ∈ dom(Θ), there exists a flat type σ such that ty(Θ(x)) = {σ}.

A well-typed shredding environment Φ is flat-typed if, and only if, for all ϕ ∈ dom(Φ), we have that ty('Φ(ϕ)() is a flat collection type.

**Lemma 4.** Suppose <sup>Φ</sup>; <sup>Θ</sup> <sup>M</sup> <sup>⇒</sup> <sup>M</sup>˘ <sup>|</sup> <sup>Ψ</sup>, where <sup>Φ</sup> and <sup>Θ</sup> are flat-typed. Then, M˘ and Ψ are also flat-typed.

It is important to note that the composition of shredding and '·( does not produce normalized N RCλ(Set, Bag) terms: when we shred a comprehension, we add to the output shredding environment a graph returning a comprehension, and when we translate this to N RCλ(Set, Bag) we get two nested comprehensions:

$$\left| \mathcal{G}(x \leftarrow \delta t; \bigvee \{\check{\chi}\check{M}\} | y \leftarrow \iota Q^\* \hat{\chi}) \right| = \bigvee \{\iota x, z\} |x \leftarrow \iota \delta t, z \leftarrow \bigvee \{\check{\chi}\big| \check{M}\} \hat{\chi} |y \leftarrow \iota Q^\* \hat{\chi}\rangle$$

$$\begin{array}{lcl} & \{X:b\} \Xi \triangleq X & \text{(if } X \text{ is not an index)}\\ \nrightarrow \langle \begin{subarray}{l} \langle \ell = \overline{N} \rangle: \langle \overline{\ell \cdot \tau} \rangle \| \Xi \triangleq \langle \ell = \langle \dot{\overline{N}} : \tau \rangle \Xi \rangle\\ \langle \ell = \overline{N} \rangle.\ell\_{\!\!\! .}: \tau \| \Xi \triangleq \langle \mathbb{N}\_{\!\!\! .}: \tau \rangle \Xi \end{subarray} & \begin{subarray}{l} \langle \ell = \overline{\ell \cdot \tau} \rangle \Xi \end{subarray} & \langle \ell = \langle \dot{\overline{N}} : \tau \rangle \Xi \rangle\\ \langle \ell = \overline{N} \rangle.\ell\_{\!\!\! .}: \tau \| \Xi \triangleq \langle \mathbb{N}\_{\!\!\! .}: \tau \rangle \| \Xi \end{subarray} & \begin{subarray}{l} \langle \ell : \tau \rangle \| \Xi \end{subarray} & \langle \ell : \tau \rangle \| \, p \leftarrow \Xi(\varphi), p.1 = \langle \overline{V} \rangle \}\\ \langle \text{index}(\varphi, \overline{V}): \{\tau\} \rangle \Xi \triangleq \langle \downarrow \{\!\!\!\!\!\!/.p .} \exists \, \, \exists \, \langle \uparrow \!\!\!\!/.p . \, \exists \, \langle \uparrow \!\!\!\!\!\!/.p \leftarrow \Xi(\varphi), p.1 = \langle \overline{V} \rangle \rangle \} \end{array}$$

**Fig. 9.** The stitching function.

In fact, not only is this term not in normal form, but it may even contain, within Q∗, a lateral reference to x; thus, after a flattening translation, we will always require the resulting queries to be renormalized and, if needed, delateralized.

Let norm denote N RCλ(Set, Bag) normalization, and S denote the evaluation of relational normal forms: we define the shredded value set Ξ corresponding to a shredding environment Φ as follows:

$$\Xi \triangleq \{ \varphi \mapsto \mathcal{S}(norm(\lfloor M \rfloor)) | [\varphi \mapsto M] \in \Phi \}$$

The evaluation S is ordinarily performed by a DBMS after converting the N RCλ(Set, Bag) query to SQL, as described in Section 5. The result of this evaluation is reflected in a programming language such as Links as a list of records.

### **6.2 The stitching function**

Given a N RCλ(Set, Bag) term with nested collections, we have first shredded it, obtaining a shredded N RC<sup>G</sup> term <sup>M</sup>˘ and a shredding environment <sup>Φ</sup> containing N RC<sup>G</sup> graphs; then we have used a flattening embedding to reflect both <sup>M</sup>˘ and Φ back into the flat fragment of N RCλ(Set, Bag); next we used normalization and DBMS evaluation to convert the shredding environment into a shredded value set Ξ. As the last step to evaluate M : τ , we need to combine D M˘ E and <sup>Ξ</sup> together to reconstruct the correct nested value D M˘ E : <sup>τ</sup> <sup>Ξ</sup> by stitching together partial flat values.

The stitching function is shown in Figure 9: its job is to visit all the components of tuples and collections, ignoring atomic values other than indices along the way. The real work is performed when an index (ϕ, −→V ) is found: conceptually, the index should be replaced by the result of the evaluation of <sup>ϕ</sup> ( −→V ). Remember that Ξ contains the result of the evaluation of the graph function ϕ after translation to N RCλ(Set, Bag), i.e. a collection of pairs associating each input of ϕ to the corresponding output: then, to obtain the desired result, we can take Ξ(ϕ), filter all the pairs p whose first component is −→<sup>V</sup> , and return the second component of p after a recursive stitching. Finally, observe that we track the result type argument in order to disambiguate whether to construct a set or multiset when we encounter an index.

**Theorem 4 (Correctness of stitching).** Let Θ be well-typed and ty(Θ) M : <sup>σ</sup>. Let <sup>Φ</sup> be well-typed, and suppose <sup>Φ</sup>; <sup>Θ</sup> <sup>M</sup> <sup>⇒</sup> <sup>M</sup>˘ <sup>|</sup> <sup>Ψ</sup>. Let <sup>Ξ</sup> be the result of evaluating the flattened queries in Ψ as above. Then MΨ˘ ρ = D M˘ E : <sup>τ</sup> <sup>Ξ</sup> ρ.

The full correctness result follows by combining the Theorems 3 and 4.

**Corollary 1.** For all <sup>M</sup> such that <sup>M</sup> : <sup>τ</sup> , suppose <sup>M</sup> <sup>⇒</sup> <sup>M</sup>˘ <sup>|</sup> <sup>Ψ</sup>, and let <sup>Ξ</sup> be the shredded value set obtained by evaluating the flattened queries in Ψ. Then -<sup>M</sup> <sup>=</sup> D M˘ E : <sup>τ</sup> <sup>Ξ</sup> .

# **7 Related work**

Work on language-integrated query and comprehension syntax has taken place over several decades in both the database and programming language communities. We discuss the most closely related work below.

Comprehensions, normalization and language integration The database community had already begun in the late 1980s to explore proposals for so-called nonfirst-normal-form relations in which collections could be nested inside other collections [46], but following Trinder and Wadler's initial work connecting database queries with monadic comprehensions [50], query languages based on these foundations were studied extensively, particularly by Buneman et al. [4,3]. For our purposes, Wong's work on query normalization and translation to SQL [55] is the most important landmark; this work provided the basis for practical implementations such as Kleisli and later Links. Almost as important is the later work by Libkin and Wong [33], studying the questions of expressiveness of bag query languages via a language BQL that extended basic N RC with deduplication and bag difference operators. They related this language to N RC with set semantics extended with aggregation (count/sum) operations, but did not directly address the question of normalizing and translating BQL queries to SQL. Grust and Scholl [28] were early advocates of the use of comprehensions mixing set, bag and other monadic collections for query rewriting and optimization, but did not study normalization or translatability properties.

Although comprehension-based queries began to be used in general-purpose programming languages with the advent of Microsoft LINQ [36] and Links [12], Cooper [11] made the next important foundational contribution by extending Wong's normalization result to queries containing higher-order functions and showing that an effect system could be used to safely compose queries using higher-order functions even in an ambient language with side-effects and recursive functions that cannot be used in queries. This work provided the basis for subsequent development of language-integrated query in Links [34] and was later adapted for use in F# [7], Scala [41], and by Kiselyov et al. [48] in the OCaml library QueΛ. However, on revisiting Cooper's proof to extend it to heterogeneous queries, we found a subtle gap in the proof, which was corrected in a recent paper [43]; the original result was correct. As a result, in this paper we focus on first-order fragments of these languages without loss of generality.

Giorgidze et al. [22] have shown how to support non-recursive datatypes (i.e. sums) and Grust and Ulrich [29] built on this to show how to support function types in query results using defunctionalization [29]. We considered using sums to support a defunctionalization-style strategy for query lifting, but Giorgidze et al. [22] map sum types to nested collections, which makes their approach unsuitable to our setting. Wong's original normalization result also considered sum types, but to the best of our knowledge normalization for N RCλ(Set, Bag) extended with sum types has not yet been proved.

Recent work by Suzuki et al. [48] have outlined further extensions to language-integrated query in the QueΛ system, which is based on finally-tagless syntax [6] and employs Wong's and Cooper's rewrite rules; Katsushima and Kiselyov's subsequent short paper [31] outlined extensions to handling ordering and grouping. Kiselyov and Katsushima [32] present an extension to QueΛ called Squr to handle ordering based on effect typing, and they provide an elegant translation from Squr queries to SQL based on normalization-by-evaluation. Okura and Kameyama [39] outline an extension to handle SQL-style grouping and aggregation operators in QueΛG; however, their approach potentially generates lateral variable occurrences inside grouping queries. These systems QueΛ, Squr and QueΛ<sup>G</sup> consider neither heterogeneity nor nested results.

Our adoption of tabulated functions (graphs) is inspired in part by Gibbons et al. [20], who provided an elegant rational reconstruction of relational algebra showing how standard principles for reasoning about queries arise from adjunctions. They employed types for (finite) maps and tables to show how joins can be implemented efficiently, and observed that such structures form a graded monad. We are interested in further exploring these structures and extending our work to cover ordering, grouping and aggregation.

Query decorrelation and delateralization There is a large literature on query decorrelation, for example to remove aggregation operations from SELECT or WHERE clauses (see e.g. [38,5] for further discussion). Delateralization appears related to decorrelation, but we are aware of only a few works on this problem, perhaps because most DBMSs only started to support LATERAL in the last few years. (Microsoft SQL Server has supported similar functionality for much longer through a keyword APPLY.) Our delateralization technique appears most closely related to Neumann and Kemper's work on query unnesting [38]. In this context, unnesting refers to removal of "dependent join" expressions in a relational algebraic query language; such joins appear to correspond to lateral subqueries. This approach is implemented in the HyPER database system, but is not accompanied by a proof of correctness, nor does it handle nested query results. It would be interesting to formalize this approach (or others from the decorrelation literature) and relate it to delateralization.

Querying nested collections Our approach to querying nested heterogeneous collections clearly specializes to the homogeneous cases for sets and multisets respectively, which have been studied separately. Van den Bussche's work on

simulating queries on nested sets using flat ones [54] has also inspired subsequent work on query shredding, flattening and (in this paper) lifting, though the simulation technique itself does not appear practical (as discussed in the extended version of Cheney et al. [9]). More recently, Benedikt and Pradic [1] presented results on representing queries on nested collections using a bounded number of interpretations (first-order logic formulas corresponding to definable flat query expressions) in the context of their work on synthesizing N RC queries from proofs. This approach considers set-valued N RC only, and its relationship to our approach should be investigated further.

Cheney et al.'s previous work on query shredding for multiset queries [8] is different in several important respects. In that work we did not consider deduplication and bag difference operations from BQL, which Libkin and Wong showed cannot be expressed in terms of other N RC operations. The shredding translation was given in several stages, and while each stage is individually comprehensible, the overall approach is not easy to understand. Finally, the last stages of the translation relied on SQL features not present (or expressible) in the source language, such as ordering and the SQL:1999 ROW NUMBER construct, to synthesize uniform integer keys. Our approach, in contrast, handles set, bag, and mixed queries, and does not rely on any SQL:1999 features.

In a parallel line of work, Grust et al. [26,21,51,53,52] have developed a number of approaches to querying nested list data structures, first in the context of XML processing [24] and subsequently for N RC-like languages over lists. The earlier approach [26], named loop-lifting (not to be confused with query lifting!) made heavy use of SQL:1999 capabilities for numbering and indexing to decouple nested collections from their context, and was implemented in both Links [51] and earlier versions of the Database Supported Haskell library [21], both of which relied on an advanced query optimizer called Pathfinder [27] to optimize these queries. The more recent approach, implemented by Ulrich in the current version of DSH and described in detail in his thesis [52], is called query flattening and is instead based on techniques from nested data parallelism [2]. Both loop-lifting and query flattening are very powerful, and do not rely on an initial normalization stage, while supporting a rich source language with list semantics, ordering, grouping, aggregation, and deduplication which can in principle emulate set or multiset semantics. However, to the best of our knowledge no correctness proofs exist for either technique. We view finding correctness results for richer query languages as an important challenge for future work.

Another parallel line of work started by Fegaras and Maier [15,14] considers heterogeneous query languages based on monoid comprehensions, with set, list, and bag collections as well as grouping, aggregation and ordering operations, in the setting of object-oriented databases, and forms the basis for complex object database systems such as λDB [16] and Apache MRQL [14]. However, Wongstyle normalization results or translations from flat or nested queries to SQL are not known for these calculi.

Lambda-lifting and closure conversion Since Johnsson's original work [30], lambda-lifting and closure conversion have been studied extensively for functional languages, with Minamide et al.'s typed closure conversion [37] of particular interest in compilers employing typed intermediate languages. We plan to study whether known optimizations in the lambda-lifting and closure conversion literature offer advantages for query lifting. The immediate important next step is to implement our approach and compare it empirically with previous techniques such as query shredding and query flattening. By analogy with lambda-lifting and closure conversion, we expect additional optimizations to be possible by a deeper analysis of how variables/fields are used in lifted subqueries. Another problem we have not resolved is how to deal with deduplication or bag difference at nested collection types in practice. Libkin and Wong [33] showed that such nesting can be eliminated from BQL queries, but their results do not provide a constructive algorithm for eliminating the nesting.

# **8 Conclusions**

Monadic comprehensions have proved to be a remarkably durable foundation for database programming and language-integrated query, and has led to language support (LINQ for .NET, Quill for Scala) with widespread adoption. Recent work has demonstrated that techniques for evaluating queries over nested collections, such as query shredding or query flattening, can offer order-of-magnitude speedups in database applications [19] without sacrificing declarativity or readability. However, query shredding lacks the ability to express common operations such as deduplication, while query flattening is more expressive but lacks a detailed proof of correctness, and both techniques are challenging to understand, implement, or extend. We provide the first provably correct approach to querying nested heterogeneous collections involving both sets and multisets.

Our most important insight is that working in a heterogeneous language, with both set and multiset collection types, actually makes the problem easier, by making it possible to calculate finite maps representing the behavior of nested query subexpressions under all of the possible environments encountered at run time. Thus, instead of having to maintain or synthesize keys linking inner and outer collections, as is done in all previous approaches, we can instead use the values of variables in the closures of nested query expressions themselves as the keys. The same approach can be used to eliminate sideways informationpassing. This is analogous to lambda-lifting or closure conversion in compilation of functional languages, but differs in that we lift local queries to (queries that compute) finite maps rather than ordinary function abstractions. We believe this idea may have broader applications and will next investigate its behavior in practice and applications to other query language features.

Acknowledgments This work was supported by ERC Consolidator Grant Skye (grant number 682315), and by an ISCF Metrology Fellowship grant provided by the UK government's Department for Business, Energy and Industrial Strategy (BEIS). We are grateful to Simon Fowler for feedback and to anonymous reviewers for constructive comments.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Reverse AD at Higher Types: Pure, Principled and Denotationally Correct**

Matthijs V´ak´ar(-)

Utrecht University, Utrecht, Netherlands m.i.l.vakar@uu.nl

**Abstract.** We show how to define forward- and reverse-mode automatic differentiation source-code transformations or on a standard higher-order functional language. The transformations generate purely functional code, and they are principled in the sense that their definition arises from a categorical universal property. We give a semantic proof of correctness of the transformations. In their most elegant formulation, the transformations generate code with linear types. However, we demonstrate how the transformations can be implemented in a standard functional language without sacrificing correctness. To do so, we make use of abstract data types to represent the required linear types, e.g. through the use of a basic module system.

**Keywords:** automatic differentiation · program correctness · semantics.

# **1 Introduction**

Automatic differentiation (AD) is a technique for transforming code that implements a function f into code that computes f's derivative, essentially by using the chain rule for derivatives. Due to its efficiency and numerical stability, AD is the technique of choice whenever derivatives need to be computed of functions that are implemented as programs, particularly in high dimensional settings. Optimization and Monte-Carlo integration algorithms, such as gradient descent and Hamiltonian Monte-Carlo methods, rely crucially on the calculation of derivatives. These algorithms are used in virtually every machine learning and computational statistics application, and the calculation of derivatives is usually the computational bottleneck. These applications explain the recent surge of interest in AD, which has resulted in the proliferation of popular AD systems such as TensorFlow [1], PyTorch [30], and Stan Math [9].

AD, roughly speaking, comes in two modes: forward-mode and reverse-mode. When differentiating a function **<sup>R</sup>**<sup>n</sup> <sup>→</sup> **<sup>R</sup>**<sup>m</sup>, forward-mode tends to be more efficient if m > n, while reverse-mode generally is more performant if n > m. As most applications reduce to optimization or Monte-Carlo integration of an objective function **<sup>R</sup>**<sup>n</sup> <sup>→</sup> **<sup>R</sup>** with <sup>n</sup> very large (today, in the order of 10<sup>4</sup> <sup>−</sup> <sup>10</sup><sup>7</sup>), reverse-mode AD is in many ways the more interesting algorithm.

However, reverse AD is also more complicated to understand and implement than forward AD. Forward AD can be implemented as a structure-preserving program transformation, even on languages with complex features [32]. As such, it admits an elegant proof of correctness [20]. By contrast, reverse-AD is only well-understood as a source-code transformation (also called define-then-run style AD) on limited programming languages. Typically, its implementations on more expressive languages that have features such as higher-order functions make use of define-by-run approaches. These approaches first build a computation graph during runtime, effectively evaluating the program until a straight-line first-order program is left, and then they evaluate this new program [30,9]. Such approaches have the severe downside that the differentiated code cannot benefit from existing optimizing compiler architectures. As such, these AD libraries need to be implemented using carefully, manually optimized code, that for example does not contain any common subexpressions. This implementation process is precarious and labour intensive. Further, some whole-program optimizations that a compiler would detect go entirely unused in such systems.

Similarly, correctness proofs of reverse AD have taken a define-by-run approach and have relied on non-standard operational semantics, using forms of symbolic execution [2,28,8]. Most work that treats reverse-AD as a source-code transformation does so by making use of complex transformations which introduce mutable state and/or non-local control flow [31,38]. As a result, we are not sure whether and why such techniques are correct. Another approach has been to compile high-level languages to a low-level imperative representation first, and then to perform AD at that level [22], using mutation and jumps. This approach has the downside that we might lose important opportunities for compiler optimizations, such as map-fusion and embarrassingly parallel maps, which we can exploit if we perform define-then-run AD on a high-level representation.

A notable exception to these define-by-run and non-functional approaches to AD is [16], which presents an elegant, purely functional, define-then-run version of reverse AD. Unfortunately, their techniques are limited to first-order programs over tuples of real numbers. This paper extends the work of [16] to apply to higher-order programs over (primitive) arrays of reals:


# **2 Key Ideas**

Consider a simple programming language. Types are statically sized arrays **real**<sup>n</sup> for some n, and programs are obtained from a collection of (unary) primitive operations <sup>x</sup> : **real**<sup>n</sup> op(x) : **real**<sup>m</sup> (intended to implement differentiable functions like linear algebra operations and sigmoid functions) by sequencing.

We can implement both forward mode −→D and reverse mode AD ←−D on this language as source-code translations to the larger language of a simply typed λcalculus over the ground types **real**<sup>n</sup> that includes at least the same operations. Forward (resp. reverse) AD translates a type τ to a pair of types −→D (τ ) = ( −→D (τ )1, −→D (τ )2) (resp. ←−D (τ )=(←−D (τ )1, ←−D (τ )2)) – the first component for holding function values, also called primals in the AD literature; the second component for holding derivative values, also called tangents (resp. adjoints or cotangents):

$$
\overrightarrow{\mathcal{D}}(\mathbf{real}^n) \overset{\text{def}}{=} \overleftarrow{\mathcal{D}}(\mathbf{real}^n) = (\mathbf{real}^n, \mathbf{real}^n).
$$

We translate terms <sup>x</sup> : <sup>τ</sup> <sup>t</sup> : <sup>σ</sup> to pairs of terms −→D (t)=(−→D (t)1, −→D (t)2) for forward AD and ←−D (t)=(←−D (t)1, ←−D (t)2) for reverse AD, which have types

$$\begin{array}{llll} x: \overrightarrow{\mathcal{D}}(\tau)\_{1} \vdash \overrightarrow{\mathcal{D}}(t)\_{1}: \overrightarrow{\mathcal{D}}(\sigma)\_{1} & \text{and} & x: \overleftarrow{\mathcal{D}}(\tau)\_{1} \vdash \overleftarrow{\mathcal{D}}(t)\_{1}: \overleftarrow{\mathcal{D}}(\sigma)\_{1} \\ x: \overrightarrow{\mathcal{D}}(\tau)\_{1} \vdash \overrightarrow{\mathcal{D}}(t)\_{2}: \overrightarrow{\mathcal{D}}(\tau)\_{2} \rightarrow \overrightarrow{\mathcal{D}}(\sigma)\_{2} & & x: \overleftarrow{\mathcal{D}}(\tau)\_{1} \vdash \overleftarrow{\mathcal{D}}(t)\_{2}: \overleftarrow{\mathcal{D}}(\sigma)\_{2} \rightarrow \overleftarrow{\mathcal{D}}(\tau)\_{2}. \end{array}$$

−→D (t)<sup>1</sup> and ←−D (t)<sup>1</sup> perform the primal computations for the program t, while −→D (t)<sup>2</sup> and ←−D (t)<sup>2</sup> compute the derivatives, resp., for forward and reverse AD.

Indeed, we define, by induction on the syntax:

$$\begin{split} \overrightarrow{\mathcal{D}}(x) & \stackrel{\text{def}}{=} \overleftarrow{\mathcal{D}}(x) \stackrel{\text{def}}{=} (x, \lambda y. y) \quad \overrightarrow{\mathcal{D}}(\mathsf{op}(t))\_{1} \stackrel{\text{def}}{=} \mathsf{op}(\overrightarrow{\mathcal{D}}(t)\_{1}) \quad \overleftarrow{\mathcal{D}}(\mathsf{op}(t))\_{1} \stackrel{\text{def}}{=} \mathsf{op}(\overleftarrow{\mathcal{D}}(t)\_{1}) \\ \overrightarrow{\mathcal{D}}(\mathsf{op}(t))\_{2} & \stackrel{\text{def}}{=} \lambda y. (D\textbf{op})(\overrightarrow{\mathcal{D}}(t)\_{1}) \left(\overrightarrow{\mathcal{D}}(t)\_{2} \, y\right) \quad \overleftarrow{\mathcal{D}}(\mathsf{op}(t))\_{2} \stackrel{\text{def}}{=} \lambda y. \overleftarrow{\mathcal{D}}(t)\_{2} \left((D\textbf{op})^{t}(\overleftarrow{\mathcal{D}}(t)\_{1}) \, y\right), \end{split}$$

where we assume that we have chosen suitable terms <sup>x</sup> : **real**<sup>n</sup> (Dop)(x) : **real**<sup>n</sup> <sup>→</sup> **real**<sup>m</sup> and <sup>x</sup> : **real**<sup>n</sup> (Dop) t (x) : **real**<sup>m</sup> <sup>→</sup> **real**<sup>n</sup> to represent the (multivariate) derivative and transposed (multivariate) derivative, respectively, of the primitive operation op : **real**<sup>n</sup> <sup>→</sup> **real**<sup>m</sup>.

For example, in case of multiplication <sup>x</sup> : **real**<sup>n</sup> op(x)=(∗)(x) : **real**, we can choose <sup>D</sup>(∗)(x) = λy : **real**<sup>2</sup>.**swap**(x) • <sup>y</sup> and (D(∗))<sup>t</sup> (x) = λy : **real**.y · **swap**(x), where **swap** is a unary operation on **real**<sup>2</sup> that swaps both components, (•) is a binary inner product operation on **real**<sup>2</sup> and (·) is a binary scalar product operation for rescaling a vector in **real**<sup>2</sup> by a real number .

To illustrate the difference between −→D and ←−D , consider the program t = op2(op1(x)) performing two operations in sequence. Then, −→D (t)<sup>1</sup> <sup>=</sup> op2(op1(x)) = ←−D (t)<sup>1</sup> and (after <sup>β</sup>-reducing, for legibility)

$$\begin{aligned} \overrightarrow{\mathcal{D}}(t)\_2 &= \lambda y. (D\mathsf{op}\_2)(\mathsf{op}\_1(x))((D\mathsf{op}\_1)(x)(y)) \\ \overleftarrow{\mathcal{D}}(t)\_2 &= \lambda y. (D\mathsf{op}\_1)^t(x)((D\mathsf{op}\_2)^t(\mathsf{op}\_1(x))(y)) .\end{aligned}$$

In general, −→D computes the derivative of a program that is a composition of operations op1,..., op<sup>n</sup> as the composition (Dop1),...,(Dopn) of the (multivariate) derivatives, in the same order as the original computation. By constrast, ←−D computes the transposed derivative of such a composition of op1,..., op<sup>n</sup> as the composition of the transposed derivatives (Dopn) t ,...,(Dop1) t . Observe the reversed order compared to the original composition!

While this AD technique works on the limited first-order language we described, it is far from satisfying. Notably, it has the following two shortcomings:


The key contributions of this paper are its extension of this transformation (see §7) to apply to a full simply typed λ-calculus (of §3), and its proof that this transformation is correct (see §8).

Shortcoming 1 seems easy to address, at first sight. Indeed, as the (co)tangent vectors to a product of spaces are simply tuples of (co)tangent vectors, one would expect to define, for a product type τ*∗*σ,

−→D (τ*∗*σ) def = (−→D (<sup>τ</sup> )1*∗*−→D (σ)1, −→D (<sup>τ</sup> )2*∗*−→D (σ)2) ←−D (τ*∗*σ) def = (←−D (<sup>τ</sup> )1*∗*←−D (σ)1, ←−D (τ )2*∗*←−D (σ)2). Indeed, this technique straightforwardly applies to forward mode AD:

$$\begin{split} \overrightarrow{\mathcal{D}}\left(\langle t,s\rangle\right) & \stackrel{\text{def}}{=} \left(\langle\overrightarrow{\mathcal{D}}(t)\_{1},\overrightarrow{\mathcal{D}}(s)\_{1}\rangle,\lambda y.\langle\overrightarrow{\mathcal{D}}(t)\_{2}(y),\overrightarrow{\mathcal{D}}(s)\_{2}(y)\rangle\right) \\ \overrightarrow{\mathcal{D}}\left(\mathbf{f}\mathbf{st}\,t\right) & \stackrel{\text{def}}{=} \left(\mathbf{f}\mathbf{st}\,\overrightarrow{\mathcal{D}}(t)\_{1},\lambda y.\mathbf{f}\mathbf{st}\,\overrightarrow{\mathcal{D}}(t)\_{2}(y)\right) \end{split} \qquad \overrightarrow{\mathcal{D}}\left(\mathbf{snd}\,t\right) \stackrel{\text{def}}{=} \left(\mathbf{snd}\,\overrightarrow{\mathcal{D}}(t)\_{1},\lambda y.\mathbf{snd}\,\overrightarrow{\mathcal{D}}(t)\_{2}(y)\right). \end{split}$$

For reverse mode AD, however, tuples already present challenges. Indeed, we would like to use the definitions below, but they require terms 0 : τ and t + s : τ for any two t, s : τ for each type τ :

$$\begin{split} \mathop{\mathsf{SD}}\big(\langle t,s\rangle\big) & \stackrel{\mathrm{def}}{=} \big(\langle\widetilde{\boldsymbol{\mathcal{D}}}(t)\_{1},\widetilde{\boldsymbol{\mathcal{D}}}(s)\_{1}\rangle,\lambda y.\varwidetilde{\boldsymbol{\mathcal{D}}}(t)\_{2}\big(\mathsf{fst}\,y\big)+\urcorner\widetilde{\boldsymbol{\mathcal{D}}}(s)\_{2}\big(\mathsf{snd}\,y\big)\big) \\ \mathop{\mathsf{SD}}\big(\mathsf{fst}\,t\big) & \stackrel{\mathrm{def}}{=} \big(\mathsf{fst}\,\widetilde{\boldsymbol{\mathcal{D}}}(t)\_{1},\lambda y.\langle\widetilde{\boldsymbol{\mathcal{D}}}(t)\_{2}\big(y\big),\underline{\mathbf{0}}\rangle\big) \qquad \widetilde{\boldsymbol{\mathcal{D}}}\big(\mathsf{snd}\,t\big) \stackrel{\mathrm{def}}{=} \big(\mathsf{snd}\,\widetilde{\boldsymbol{\mathcal{D}}}(t)\_{1},\lambda y.\langle\underline{\mathbf{0}},\widetilde{\boldsymbol{\mathcal{D}}}(t)\_{2}\big(y\big)\rangle\big). \end{split}$$

These formulae capture the well-known issue of fanout translating to addition in reverse AD, caused by the contravariance of its second component [31]. Such 0 and + could indeed be defined by induction on the structure of types, using <sup>0</sup> and + at **real**<sup>n</sup>. However, more problematically, −, −, **fst** <sup>−</sup> and **snd** <sup>−</sup> represent explicit uses of structural rules of contraction and weakening at types τ , which, in a λ-calculus, can also be used implicitly in the typing context Γ. Thus, we should also make these implicit uses explicit to account for their presence in the code. Then, we can appropriately translate them into their "mirror image": we map the contraction-weakening comonoids to the monoid structures (+, 0).

**Insight 1.** In functional define-then-run reverse AD, we need to make use of explicit structural rules and "mirror them", which we can do by first translating our language into combinators. This translation allows us to avoid the usual practice (e.g. [38]) of accumulating adjoints at run-time with mutable state: instead, we detect all adjoints to accumulate at compile-time.

Put differently: we define AD on the syntactic category **Syn** with types τ as objects and (α)βη-equivalence classes of programs x : τ t : σ as morphisms τ → σ.

Yet the question remains: why should this translation for tuples be correct? What is even less clear is how to address shortcoming 2. What should the spaces of tangents −→D (<sup>τ</sup> <sup>→</sup> <sup>σ</sup>)<sup>2</sup> and adjoints ←−D (<sup>τ</sup> <sup>→</sup> <sup>σ</sup>)<sup>2</sup> look like? This is not something we are taught in Calculus 1.01. Instead, we again employ category theory:

**Insight 2.** Follow where the categorical structure of the syntax leads you, as doing so produces principled definitions that are easy to prove correct.

With the aim of categorical compositionality in mind, we note that our translations compose according to a sort of "syntactic chain-rule", which says that

$$
\begin{array}{l}
\overrightarrow{\mathcal{D}}\left(t[\prescript{s}{}{\prime}\_{x}]\right) \overset{\text{def}}{=} \left(\overrightarrow{\mathcal{D}}\left(t\right)\_{1}[\prescript{\overrightarrow{\mathcal{D}}}{\mathcal{D}}\_{x}]\_{\textit{x}}\right), \lambda y. \overrightarrow{\mathcal{D}}\left(t\right)\_{2}[\prescript{\overrightarrow{\mathcal{D}}}{\mathcal{D}}\_{x}\big(s\big)\_{1}\big(\overrightarrow{\mathcal{D}}\big(s\big)\_{2}\big(y\big)\big)\right) \\
\nleftarrow \left(\overleftarrow{\mathcal{D}}\left(t\big)\_{x}\big(\mathscr{D}\big(t\big)\_{1}\big(\mathscr{D}\big(s\big)\_{1}\big), \lambda y. \overleftarrow{\mathcal{D}}\big(s\big)\_{2}\big(\overleftarrow{\mathcal{D}}\big(t\big)\_{2}\big(y\big)\big)\right).
\end{array}
$$

By the following trick, these equations are functoriality laws. Given a Cartesian closed category (C, **<sup>1</sup>**, <sup>×</sup>, <sup>⇒</sup>), define categories −→<sup>D</sup> [C] and ←− <sup>D</sup> [C] as having objects pairs (A1, A2) of objects A1, A<sup>2</sup> of C and morphisms

$$
\begin{array}{l}
\overrightarrow{\mathfrak{S}}[\mathcal{C}]((A\_1, A\_2), (B\_1, B\_2)) \overset{\text{def}}{=} \mathcal{C}(A\_1, B\_1) \times \mathcal{C}(A\_1, A\_2 \Rightarrow B\_2) \\
\nleftarrow[\mathcal{C}]((A\_1, A\_2), (B\_1, B\_2)) \overset{\text{def}}{=} \mathcal{C}(A\_1, B\_1) \times \mathcal{C}(A\_1, B\_2 \Rightarrow A\_2).
\end{array}
$$

Both have identities id(A1,A2) def = (id<sup>A</sup><sup>1</sup> , Λ(π2)), where we write <sup>Λ</sup> for categorical currying and <sup>π</sup><sup>2</sup> for the second projection. Composition in −→<sup>D</sup> [C] and ←− <sup>D</sup> [C], respectively, of (A1, A2) (k1,k2) −−−−→ (B1, B2) (l1,l2) −−−−→ (C1, C2) are

$$\begin{aligned} (k\_1, k\_2); (l\_1, l\_2) &\stackrel{\text{def}}{=} (k\_1; l\_1, \lambda a\_1 : A\_1. \lambda a\_2 : A\_2. l\_2(k\_1(a\_1))(k\_2(a\_1, a\_2))) \\ (k\_1, k\_2); (l\_1, l\_2) &\stackrel{\text{def}}{=} (k\_1; l\_1, \lambda a\_1 : A\_1. \lambda c\_2 : C\_2. k\_2(a\_1)(l\_2(k\_1(a\_1), c\_2))), \end{aligned}$$

where we work in the internal language of C. Then, we have defined two functors:

$$
\overrightarrow{\mathcal{D}}: \mathbf{Sym}\_1 \to \overrightarrow{\mathcal{D}}[\mathbf{Sym}] \qquad\qquad \overleftarrow{\mathcal{D}}: \mathbf{Sym}\_1 \to \overleftarrow{\mathcal{D}}[\mathbf{Sym}],
$$

where we write **Syn**<sup>1</sup> for the syntactic category of our restrictive first-order language, and we write **Syn** for that of the full λ-calculus. We would like to extend these to functors

$$\mathbf{Sym} \to \overline{\mathfrak{D}}[\mathbf{Sym}] \qquad\qquad\qquad \mathbf{Sym} \to \overleftarrow{\mathfrak{D}}[\mathbf{Sym}].$$

−→<sup>D</sup> [C] turns out to be a category with finite products, given by(A1, A2)×(B1, B2)= (A<sup>1</sup> <sup>×</sup> <sup>B</sup>1, A<sup>2</sup> <sup>×</sup> <sup>B</sup>2). Thus, we can easily extend −→D to apply to an extension of **Syn**<sup>1</sup> with tuples by extending the functor in the unique structure-preserving way. However, ←− D [**Syn**] does not have products and neither −→D [**Syn**] nor ←− D [**Syn**] supports function types. (The reason turns out to be that not all functions are linear in the sense of respecting 0 and +.) Therefore, the categorical structure does not give us guidance on how to extend our translation to all of **Syn**.

**Insight 3.** Linear types can help. By using a more fine-grained type system, we can capture the linearity of the derivative. As a result, we can phrase AD on our full language simply as the unique structure-preserving functor that extends the uncontroversial definitions given so far.

To implement this insight, we extend our λ-calculus to a language **LSyn** with limited linear types (in §4): linear function types and a kind of multiplicative conjunction !(−) ⊗ (−), in the sense of the enriched effect calculus [14]. The algebraic effect giving rise to these linear types, in this instance, is that of the theory of commutative monoids. As we have seen, such monoids are intimately related to reverse AD. Consequently, we demand that every f with a linear function type τ σ is indeed linear, in the sense that f 0 = 0 and f (t + s) = (f t)+(f s). For the categorically inclined reader: that is, we enrich **LSyn** over the category of commutative monoids.

Now, we can give more precise types to our derivatives, as we know they are linear functions: for <sup>x</sup> : <sup>τ</sup> <sup>t</sup> : <sup>σ</sup>, we have <sup>x</sup> : −→D (<sup>τ</sup> )<sup>1</sup> −→D (t)<sup>2</sup> : −→D (<sup>τ</sup> )<sup>2</sup> −→D (σ)<sup>2</sup> and <sup>x</sup> : ←−D (<sup>τ</sup> )<sup>1</sup> ←−D (t)<sup>2</sup> : ←−D (σ)<sup>2</sup> ←−D (<sup>τ</sup> )2. Therefore, given any model <sup>L</sup> of our linear type theory, we generalise our previous construction of the categories −→<sup>D</sup> [L] and ←− <sup>D</sup> [L], but now we work with linear functions in the second component. Unlike before, both −→<sup>D</sup> [L] and ←− <sup>D</sup> [L] are now Cartesian closed (by §6)!

Thus, we find the following corollary, by the universal property of **Syn**. This property states that any well-typed choice of interpretations F(op) of the primitive operations in a Cartesian closed category C extends to a unique Cartesian closed functor F : **Syn** → C. It gives a principled definition of AD and explains in what sense reverse AD is the "mirror image" of forward AD.

**Corollary** (Definition of AD, §7)**.** Once we fix the interpretation of the primitives operations op to their respective derivatives and transposed derivatives, we obtain unique structure-preserving forward and reverse AD functors −→D : **Syn** <sup>→</sup> −→<sup>D</sup> [**LSyn**] and ←−D : **Syn** <sup>→</sup> ←− D [**LSyn**].

In particular, the following definitions are forced on us by the theory:

**Insight 4.** For reverse AD, an adjoint at function type τ → σ, needs to keep track of the incoming adjoints <sup>v</sup> of type ←−D (σ)<sup>2</sup> for each a primal <sup>x</sup> of type ←−D (<sup>τ</sup> )<sup>1</sup> on which we call the function. We store these pairs (x, v) in the type ! ←−D (<sup>τ</sup> )<sup>1</sup> <sup>⊗</sup> ←−D (σ)<sup>2</sup> (which we will see is essentially a quotient of a list of pairs of type ←−D (τ )1*∗*←−D (σ)2). Less surprisingly, for forward AD, a tangent at function type <sup>τ</sup> <sup>→</sup> <sup>σ</sup> consists of a function sending each argument primal of type −→D (<sup>τ</sup> )<sup>1</sup> to the outgoing tangent of type −→D (σ)2.

$$\begin{split} \overrightarrow{\mathcal{D}}(\tau \to \sigma) & \overset{\text{def}}{=} (\overrightarrow{\mathcal{D}}(\tau)\_{1} \to (\overrightarrow{\mathcal{D}}(\sigma)\_{1} \ast (\overrightarrow{\mathcal{D}}(\tau)\_{2} \rightharpoonup \overrightarrow{\mathcal{D}}(\sigma)\_{2})), \overrightarrow{\mathcal{D}}(\tau)\_{1} \to \overrightarrow{\mathcal{D}}(\sigma)\_{2}) \\ \nleftarrow \overrightarrow{\mathcal{D}}(\tau \to \sigma) & \overset{\text{def}}{=} (\overleftarrow{\mathcal{D}}(\tau)\_{1} \to (\overleftarrow{\mathcal{D}}(\sigma)\_{1} \ast (\overleftarrow{\mathcal{D}}(\sigma)\_{2} \rightharpoonup \overdot{\mathcal{D}}(\tau)\_{2})), \overset{\text{\textquotedblleft}}{\mathcal{D}}(\tau)\_{1} \otimes \overleftarrow{\mathcal{D}}(\sigma)\_{2}) \end{split}$$

With these definitions in place, we turn to the correctness of the source-code transformations. To phrase correctness, we first need to construct a suitable denotational semantics with an uncontroversial notion of semantic differentiation. A technical challenge arises, as the usual calculus setting of Euclidean spaces (or manifolds) and smooth functions cannot interpret higher-order functions. To solve this problem, we work with a conservative extension of this standard calculus setting (see §5): the category **Diff** of diffeological spaces. We model our types as diffeological spaces, and programs as smooth functions. By keeping track of a commutative monoid structure on these spaces, we are also able to interpret the required linear types. We write **DiffCM** for this "linear" category of commutative diffeological monoids and smooth monoid homomorphisms.

By the universal properties of the syntax, we obtain canonical, structurepreserving functors -<sup>−</sup> : **LSyn** <sup>→</sup> **DiffCM** and -<sup>−</sup> : **Syn** <sup>→</sup> **Diff** once we fix interpretations **<sup>R</sup>**<sup>n</sup> of **real**<sup>n</sup> and well-typed interpretations op for each operation op. These functors define a semantics for our language.

Having constructed the semantics, we can turn to the correctness proof (of §8). Because calculus does not provide an unambiguous notion of derivative at function spaces, we cannot prove that the AD transformations correctly implement mathematical derivatives by plain induction on the syntax. Instead, we use a logical relations argument over the semantics, which we phrase categorically:

**Insight 5.** Once we show that the derivatives of primitive operations op are correctly implemented, correctness of derivatives of other programs follows from a standard logical relations construction over the semantics that relates a curve to its (co)tangent curve. By the chain-rule, all programs respect the logical relations.

To show correctness of forward AD, we construct a category −−−−−→ **SScone** whose objects are triples ((X,(Y1, Y2)), P) of an object X of **Diff**, an object (Y1, Y2) of −→<sup>D</sup> [**DiffCM**] and a predicate <sup>P</sup> on **Diff**(**R**, X) <sup>×</sup> −→<sup>D</sup> [**DiffCM**]((**R**, **<sup>R</sup>**),(Y1, Y2)). It has morphisms ((X,(Y1, Y2)), P) (f,(g,h)) −−−−−→ ((X ,(Y <sup>1</sup> , Y <sup>2</sup> )), P ), which are a pair of morphisms X <sup>f</sup> −→ <sup>X</sup> and (Y1, Y2) (g,h) −−−→ (<sup>Y</sup> <sup>1</sup> , Y <sup>2</sup> ) such that for any (γ,(δ1, δ2)) ∈ P, we have that (γ; f,(δ1, δ2); (g, h)) ∈ P . −−−−−→ **SScone** is a standard category of logical relations, or subscone, and it is widely known to inherit the Cartesian closure of **Diff** <sup>×</sup> −→<sup>D</sup> [**DiffCM**] (see §§8.1). It also comes equipped with a Cartesian closed functor −−−−−→ **SScone** −→ **Diff** <sup>×</sup> −→<sup>D</sup> [**DiffCM**]. Therefore, once we fix predicates P<sup>f</sup> **real**<sup>n</sup> on (-−, −→D [-<sup>−</sup>])(**real**<sup>n</sup>) and show that all operations op respect these predicates, it follows that our denotational semantics lifts to give a unique structure-preserving functor **Syn** <sup>−</sup><sup>f</sup> −−−→ −−−−−→ **SScone**, such that the left diagram below commutes (by the universal property of **Syn**).

$$\begin{array}{c} \mathsf{Sym} \xrightarrow{\left(\operatorname{id},\overline{\mathfrak{D}}\right)} \mathsf{Sym} \times \vec{\mathfrak{D}} [\operatorname{L}\mathsf{Sym}] \\ \Vdash^{\ell} \xrightarrow{\left(\begin{array}{c} \mathsf{I}\end{array}\right)} \end{array} \qquad \begin{array}{c} \mathsf{Sym} \xrightarrow{\left(\operatorname{id},\overline{\mathfrak{D}}\right)} \mathsf{Sym} \xrightarrow{\left(\operatorname{id},\overline{\mathfrak{D}}\right)} \mathsf{Sym} \times \xleftarrow{} \mathsf{\widetilde{D}}[\operatorname{L}\mathsf{Sym}] \\ \Vdash^{\ell} \xrightarrow{\left(\begin{array}{c} \mathsf{I}\end{array}\right)} \end{array} \qquad \begin{array}{c} \mathsf{Sym} \xrightarrow{\left(\operatorname{id},\overline{\mathfrak{D}}\right)} \mathsf{Sym} \xrightarrow{\left(\operatorname{id},\overline{\mathfrak{D}}\right)} \mathsf{Sym} \xrightarrow{\left(\begin{array}{c} \mathsf{I}\end{array}\right)} \mathsf{Sym} \xrightarrow{\left(\begin{array}{c} \mathsf{I}\end{array}\right)} \mathsf{Sym} \xrightarrow{\left(\begin{array}{c} \mathsf{I}\end{array}\right)} \mathsf{Sym} \xrightarrow{\left(\begin{array}{c} \mathsf{I}\right)} \mathsf{L} \end{array} \end{array}$$

Consequently, we can work with P<sup>f</sup> **real**<sup>n</sup> def = {(f,(g, h))| g=f and h=Df} , where we write Df(x)(v) for the multivariate calculus derivative of f at a point x evaluated at a tangent vector v. By an application of the chain rule for differentiation, we see that every op respects this predicate, as long as -<sup>D</sup>op <sup>=</sup> <sup>D</sup>op. The commuting of our diagram then virtually establishes the correctness of forward AD. The only remaining step in the argument is to note that any tangent vector at <sup>τ</sup> <sup>∼</sup><sup>=</sup> **<sup>R</sup>**<sup>N</sup> , for first-order <sup>τ</sup> , can be represented by a curve **<sup>R</sup>** <sup>→</sup> <sup>τ</sup> . For reverse AD, the same construction works, if -Dop<sup>t</sup> <sup>=</sup> <sup>D</sup>op t , by replacing −→<sup>D</sup> [−] with ←− <sup>D</sup> [−] and −→D with ←−D . We can then choose <sup>P</sup><sup>r</sup> **real**<sup>n</sup> def = F (f,(g, h)) <sup>|</sup> <sup>g</sup> <sup>=</sup> <sup>f</sup> and <sup>h</sup> <sup>=</sup> <sup>x</sup> → (Df(x))<sup>t</sup> G , as the predicates for constructing **real**<sup>n</sup><sup>r</sup>, where we write <sup>A</sup><sup>t</sup> for the matrix transpose of <sup>A</sup>. We obtain our main theorem, which crucially holds even for t that involve higher-order subprograms. **Theorem** (Correctness of AD, Thm. 1)**.** For any typed term x : τ t : σ in **Syn** between first-order types τ,σ, we have that

> - −→D (t)<sup>2</sup>(x) = <sup>D</sup><sup>t</sup>(x) and - ←−D (t)<sup>2</sup>(x) = <sup>D</sup><sup>t</sup>(x) t .

Next, we address the practicality of our method (in §9). The code transformations we employ are not too daunting to implement. It is well-known how to mechanically translate λ-calculus and functional languages into a (categorical) combinatory form [12]. However, the implementation of the required linear types presents a challenge. Indeed, types like !(−) <sup>⊗</sup> (−) and (−) (−) are absent from languages such as Haskell and O'Caml. Luckily, in this instance, we can implement them using abstract data types by using a (basic) module system:

**Insight 6.** Under the hood, !τ ⊗ σ can consist of a list of values of type τ*∗*σ. Its API ensures that the list order and the difference between xs++ [(t, s),(t, s )] ++ ys and xs ++ [(t, s + s )] ++ ys cannot be observed: as such, it is a quotient type. Meanwhile, <sup>τ</sup> <sup>σ</sup> can be implemented as a standard function type <sup>τ</sup> <sup>→</sup> <sup>σ</sup> with a limited API that enforces that we can only ever construct linear functions: as such, it is a subtype.

We phrase the correctness proof of the AD transformations in elementary terms, such that it holds in the applied setting where we use abstract types to implement linear types. We show that our correctness results are meaningful, as they make use of a denotational semantics that is adequate with respect to the standard operational semantics. Finally, to stress the applicability of our method, we show that it extends to higher-order (primitive) operations, such as **map**.

# **3** *λ***-Calculus as a Source Language for AD**

As a source language for our AD translations, we can begin with a standard, simply typed λ-calculus which has ground types **real**<sup>n</sup> of statically sized arrays of <sup>n</sup> real numbers, for all <sup>n</sup> <sup>∈</sup> **<sup>N</sup>**, and sets Op<sup>m</sup> <sup>n</sup>1,...,n<sup>k</sup> of primitive operations op for all k, m, n1,...,n<sup>k</sup> ∈ **N**. These operations will be interpreted as smooth functions (**R**<sup>n</sup><sup>1</sup> <sup>×</sup> ... <sup>×</sup> **<sup>R</sup>**<sup>n</sup><sup>k</sup> ) <sup>→</sup> **<sup>R</sup>**<sup>m</sup>. Examples to keep in mind for op include


We intentionally present operations in a schematic way, as primitive operations tend to form a collection that is added to in a by-need fashion, as an AD library develops. The precise operations needed will depend on the applications, but, in statistics and machine learning applications, Op tends to include a mix of multi-dimensional linear algebra operations and mostly one-dimensional nonlinear functions. A typical library for use in machine learning would work with multi-dimensional arrays (sometimes called "tensors"). We focus here on onedimensional arrays as the issues of how precisely to represent the arrays are orthogonal to the concerns of our development.

The types τ, σ, ρ and terms t, s, r of our AD source language are as follows:


The typing rules are in Fig. 1, where we write **Dom**(op) def <sup>=</sup> **real**n<sup>1</sup> *<sup>∗</sup>* ... *<sup>∗</sup>***real**n<sup>k</sup> for an operation op <sup>∈</sup> Op<sup>m</sup> <sup>n</sup>1,...,n<sup>k</sup> . We employ the usual syntactic sugar **let** x = t**in** s def = (λx.s)t and write **real** for **real**<sup>1</sup>. As Fig. 2 displays, we consider the terms of our language up to the standard βη-theory. We could consider further equations for our operations, but we do not as we will not need them.

This standard λ-calculus is widely known to be equivalent to the free Cartesian closed category **Syn** generated by the objects **real**<sup>n</sup> and the morphisms op. **Syn** effectively represents programs as (categorical) combinators, also known as "point-free style" in the functional programming community. Indeed, there are well-studied mechanical translations from the λ-calculus to the free Cartesian closed category (and back) [26,13]. The translation from **Syn** to λ-calculus is self-evident, while the translation in the opposite direction is straightforward after we first convert our λ-terms to de Bruijn indexed form. Concretely,

	- identities: id<sup>τ</sup> ∈ **Syn**(τ,τ ) (cf., variables up to α-equivalence);
	- composition: t; s ∈ **Syn**(τ, ρ) for any t ∈ **Syn**(τ,σ) and s ∈ **Syn**(σ, ρ) (corresponding to the capture avoiding substitution s[ t /y] if we represent x : τ t : σ and y : σ s : ρ);
	- terminal morphisms: <sup>τ</sup> ∈ **Syn**(τ, **1**);
	- product pairing: t, s ∈ **Syn**(τ,σ*∗*ρ) for any t ∈ **Syn**(τ,σ) and s ∈ **Syn**(τ, ρ);
	- product projections: **fst** τ,σ ∈ **Syn**(τ*∗*σ, τ ) and **snd** τ,σ ∈ **Syn**(τ*∗*σ, σ);


**Fig. 1.** Typing rules for the AD source language.


**Fig. 2.** Standard βη-laws for products and functions. We write #x1,...,x<sup>n</sup> = to indicate that the variables x1,...,x<sup>n</sup> need to be fresh in the left hand side. Equations hold on pairs of terms of the same type. As usual, we only distinguish terms up to α-renaming of bound variables.


**–** all subject to the usual equations of a Cartesian closed category [26].

**1** and *∗* give finite products in **Syn**, while → gives categorical exponentials.

**Syn** has the following universal property: for any Cartesian closed category (C, **1**, ×, ⇒), we obtain a unique Cartesian closed functor F : **Syn** → C, once we choose objects <sup>F</sup>**real**<sup>n</sup> of <sup>C</sup> as well as, for each op <sup>∈</sup> Op<sup>m</sup> <sup>n</sup>1,...,n<sup>k</sup> , make well-typed choices of <sup>C</sup>-morphisms <sup>F</sup>op : (F**real**n<sup>1</sup> <sup>×</sup> ... <sup>×</sup> <sup>F</sup>**real**n<sup>k</sup> ) <sup>→</sup> <sup>F</sup>**real**<sup>m</sup>.

# **4 Linear** *λ***-Calculus as an Idealised AD Target Language**

As a target language for our AD source code transformations, we consider a language that extends the language of §3 with limited linear types. We could opt to work with a full linear logic as in [6] or [4]. Instead, however, we will only include the bare minimum of linear type formers that we actually need to phrase the AD transformations. The resulting language is closely related to, but more minimal than, the Enriched Effect Calculus of [14]. We limit our language in this way because we want to stress that the resulting code transformations can easily be implemented in existing functional languages such as Haskell or O'Caml. As we discuss in §9, the idea will be to make use of a module system to implement the required linear types as abstract data types.

In our idealised target language, we consider linear types (aka computation types) τ , σ, ρ, in addition to the Cartesian types (aka value types) τ , σ, ρ that we have considered so far. We think of Cartesian types as denoting spaces and linear types as denoting spaces equipped with an algebraic structure. As we are interested in studying differentiation, the relevant space structure in this instance is a geometric structure that suffices to define differentiability. Meanwhile, the relevant algebraic structure on linear types turns out to be that of a commutative monoid, as this algebraic structure is needed to phrase automatic differentiation algorithms. Indeed, we will use the linear types to denote spaces of (co)tangent vectors to the spaces of primals denoted by Cartesian types. These spaces of (co)tangents form a commutative monoid under addition.

Concretely, we extend the types and terms of our language as follows:



**Fig. 3.** Typing rules for the idealised AD target language with linear types.

t, s, r ::= terms | ... as in §3 | lop(t; s) linear op. | !t ⊗ s | **case** t **of** !y ⊗ z → s tensor product | λx.t | t{s} abstraction/appl. | 0 | t + s monoid structure. We work with linear operations lop <sup>∈</sup> LOp<sup>m</sup> n1,...,nk;n- 1,...,n- l , which are intended to represent functions which are linear (in the sense of respecting 0 and +) in the last l arguments but not in the first k. We write **Dom**(lop) def <sup>=</sup> **real**n<sup>1</sup> *<sup>∗</sup>* ... *<sup>∗</sup>***real**n<sup>k</sup> and **LDom**(lop) def = **real**n- <sup>1</sup> *<sup>∗</sup>* ... *<sup>∗</sup>***real**n- <sup>l</sup> for lop <sup>∈</sup> LOp<sup>m</sup> n1,...,nk;n- 1,...,n- l . These operations can include e.g. dense and sparse matrix-vector multiplications. Their purpose is to serve as primitives to implement derivatives Dop(x; y) and (Dop)<sup>t</sup> (x; y) of the operations op from the source language as terms that are linear in y.

In addition to the judgement Γ t : τ , which we encountered in §3, we now consider an additional judgement Γ; x : τ t : σ. While we think of the former as denoting a (structure-preserving) function between spaces, we think of the latter as a (structure-preserving) function from the space which Γ denotes to the space of (structure-preserving) monoid homomorphisms from the denotation of τ to that of σ. In this instance, "structure-preserving" will mean differentiable.

Fig. 3 displays the typing rules of our language. We consider the terms of this language up to the βη+-equational theory of Fig. 4. It includes βη-rules as well as commutative monoid and homomorphism laws.


**Fig. 4.** Equational rules for the idealised, linear AD language, which we use on top of the rules of Fig. 2. In addition to standard βη-rules for !(−) <sup>⊗</sup> (−)- and -types, we add rules making (0, +) into a commutative monoid on the terms of each linear type as well as rules which say that terms of linear types are homomorphisms in their linear variable. Equations hold on pairs of terms of the same type.

# **5 Semantics of the Source and Target Languages**

# **5.1 Preliminaries**

**Category theory** We assume familiarity with categories, functors, natural transformations, and their theory of (co)limits and adjunctions. We write:


**Monoids** We assume familiarity with the category **CMon** of commutative monoids <sup>X</sup> = (|X|, <sup>0</sup>X, <sup>+</sup>X), such as **<sup>R</sup>**<sup>n</sup> def = (**R**<sup>n</sup>, <sup>0</sup>, +), their cartesian product X ×Y , tensor product X ⊗Y , and the free monoid !S on a set S (write δ for the inclusion S \$→ |!S|). We will sometimes write <sup>n</sup> <sup>i</sup>=1 x<sup>i</sup> for ((x1+x2)+...)...+xn.

Recall that a category C is called **CMon**-enriched if we have a commutative monoid structure on each homset C(C, C ) and function composition gives monoid homomorphisms C(C, C ) ⊗ C(C , C) → C(C, C). Finite products in a category C are well-known to be biproducts (i.e. simultaneously products and coproducts) if and only if <sup>C</sup> is **CMon**-enriched (see e.g. [17]): define [] def = 0 and [f,g] def = π1; f + π2; g and, conversely, 0 def = [] and f + g def = (id, id); [f,g].

### **5.2 Abstract Semantics**

The language of §3 has a canonical interpretation in any Cartesian closed category (C, **<sup>1</sup>**, <sup>×</sup>, <sup>⇒</sup> ), once we fix <sup>C</sup>-objects **real**n to interpret **real**<sup>n</sup> and <sup>C</sup>morphisms op ∈ C(-**Dom**(op), **real**m) to interpret op∈Op<sup>m</sup> n1,...,n<sup>k</sup> . We interpret types <sup>τ</sup> and contexts <sup>Γ</sup> as <sup>C</sup>-objects <sup>τ</sup> and -<sup>Γ</sup>: <sup>x</sup><sup>1</sup> : <sup>τ</sup>1,...,x<sup>n</sup> : <sup>τ</sup><sup>n</sup> def = <sup>τ</sup><sup>1</sup> <sup>×</sup> ... <sup>×</sup> <sup>τ</sup><sup>n</sup> -**<sup>1</sup>** def <sup>=</sup> **<sup>1</sup>** <sup>τ</sup>*∗*σ def = <sup>τ</sup> <sup>×</sup> <sup>σ</sup> <sup>τ</sup> <sup>→</sup> <sup>σ</sup> def = <sup>τ</sup> <sup>⇒</sup> σ . We interpret terms <sup>Γ</sup> <sup>t</sup> : <sup>τ</sup> as morphisms <sup>t</sup> in <sup>C</sup>(-<sup>Γ</sup>, τ ): <sup>x</sup><sup>1</sup> : <sup>τ</sup>1,...,x<sup>n</sup> : <sup>τ</sup><sup>n</sup> <sup>x</sup><sup>k</sup> : <sup>τ</sup><sup>k</sup> def <sup>=</sup> <sup>π</sup><sup>k</sup> - def = () t, s def = (t, s) **fst** def <sup>=</sup> <sup>π</sup><sup>1</sup> **snd** def <sup>=</sup> <sup>π</sup><sup>2</sup> λx.t def <sup>=</sup> <sup>Λ</sup>(<sup>t</sup>) t s def = (t, <sup>s</sup>); ev. This is an instance of the universal property of **Syn** mentioned in §3.

We discuss how to extend -<sup>−</sup> to apply to the full target language of §4. Suppose that <sup>L</sup> : <sup>C</sup>op <sup>→</sup> **Cat** is a locally indexed category (see e.g. [27, §§§9.3.4]), i.e. a (strict) contravariant functor from C to the category **Cat** of categories, such that obL(C) = obL(C ) and L(f)(L) = L for any object L of obL(C) and any f : C → C in C. We say that L is biadditive if each category L(C) has (chosen) finite biproducts (**1**, ×) and L(f) preserves them, for any f : C → C in C, in the sense that L(f)(**1**) = **1** and L(f)(L × L ) = L(f)(L) × L(f)(L ). We say that it supports !(−)⊗(−)-types and ⇒-types, if L(π1) has a left adjoint !C ⊗<sup>C</sup> − and a right adjoint functor C ⇒<sup>C</sup> −, for each product projection π<sup>1</sup> : C × C → C in C, satisfying a Beck-Chevalley condition: !C ⊗<sup>C</sup> L =!C ⊗<sup>C</sup>-- L and C ⇒<sup>C</sup> L = C⇒<sup>C</sup>-- L for any C, C ∈ ob C. We simply write !C ⊗L and C ⇒ L. Let us write Φ and Ψ for the natural isomorphisms L(C)(!C ⊗ L, L ) <sup>∼</sup><sup>=</sup> −→ L(C × C )(L, L ) and L(C × C)(L, L ) <sup>∼</sup><sup>=</sup> −→ L(C)(L, C ⇒ L ). We say that L supports Cartesian -types if the functor <sup>C</sup>op <sup>→</sup> **Set**; <sup>C</sup> → L(C)(L, L ) is representable for any objects L, L of <sup>L</sup>. That is, we have objects <sup>L</sup> <sup>L</sup> of <sup>C</sup> with isomorphisms Λ : L(C)(L, L ) <sup>∼</sup><sup>=</sup> −→ C(C, L <sup>L</sup> ), natural in C. We call an L satisfying all these conditions a categorical model of the language of §4. In particular, any biadditive model of intuitionistic linear logic [29,17] is such a categorical model.

If we choose **real**<sup>n</sup> <sup>∈</sup> ob<sup>L</sup> to interpret **real**<sup>n</sup> and compatible <sup>L</sup>-morphisms lop in <sup>L</sup>(-**Dom**(lop))(-**LDom**(lop), **real**<sup>k</sup>) for each LOp<sup>m</sup> n1,...,nk;n- 1,...,n- l , then we can interpret linear types <sup>τ</sup> as objects <sup>τ</sup> of <sup>L</sup>:

$$\begin{array}{rcl} \left[\underline{\mathsf{L}}\right] \stackrel{\text{def}}{=} \mathbb{1} & \left[\underline{\mathsf{L}} \ast \underline{\mathsf{L}}\right] \stackrel{\text{def}}{=} \left[\underline{\mathsf{L}}\right] \times \left[\underline{\mathsf{L}}\right] & \left[\underline{\mathsf{L}} \to \underline{\mathsf{L}}\right] \stackrel{\text{def}}{=} \left[\underline{\mathsf{L}}\right] \Rightarrow \left[\underline{\mathsf{L}}\right] & \left[\left!\uparrow\gamma \otimes \underline{\mathsf{L}}\right] \stackrel{\text{def}}{=} \left[\left!\left!\right"\right] \otimes \left[\underline{\mathsf{L}}\right] .\end{array}$$

We can interpret <sup>τ</sup> <sup>σ</sup> as the <sup>C</sup>-object <sup>τ</sup> <sup>σ</sup> def <sup>=</sup> <sup>τ</sup> <sup>σ</sup>. Finally, we can interpret terms <sup>Γ</sup> <sup>t</sup> : <sup>τ</sup> as morphisms <sup>t</sup> in <sup>C</sup>(-<sup>Γ</sup>, <sup>τ</sup> ) and terms <sup>Γ</sup>; <sup>x</sup> : <sup>τ</sup> <sup>t</sup> : <sup>σ</sup> as <sup>t</sup> in <sup>L</sup>(-<sup>Γ</sup>)(<sup>τ</sup> , σ):


Observe that we interpret 0 and + using the biproduct structure of L.

**Proposition 1.** The interpretation -<sup>−</sup> of the language of §4 in categorical models is both sound and complete with respect to the βη+-equational theory: t βη+ = s iff <sup>t</sup> <sup>=</sup> <sup>s</sup> in each such model.

Soundness follows by case analysis on the βη+-rules. Completeness follows by the construction of the syntactic model **LSyn** : **CSyn**op <sup>→</sup> **Cat**:


### **5.3 Concrete Semantics**

**Diffeological Spaces** Throughout this paper, we have an instance of the abstract semantics of our languages in mind, as we intend to interpret **real**<sup>n</sup> as the usual Euclidean space **R**<sup>n</sup> and to interpret each program x<sup>1</sup> : **real**<sup>n</sup><sup>1</sup> ,...,x<sup>k</sup> : **real**<sup>n</sup><sup>k</sup> <sup>t</sup> : **real**<sup>m</sup> as a smooth (C∞-) function **<sup>R</sup>**<sup>n</sup><sup>1</sup> <sup>×</sup> ... <sup>×</sup> **<sup>R</sup>**<sup>n</sup><sup>k</sup> <sup>→</sup> **<sup>R</sup>**<sup>m</sup>. A challenge is that the usual settings for multivariate calculus and differential geometry do not form Cartesian closed categories, obstructing the interpretation of higher types (see [20, Appx. A]). A solution, recently employed by [20], is to work with diffeological spaces [33,21], which generalise the usual notions of differentiability from Euclidean spaces and smooth manifolds to apply to higher types (as well as a range of other types such a sum and inductive types). We will also follow this route and use such spaces to construct our concrete semantics. Other valid options for a concrete semantics exist: convenient vector spaces [19,7], Fr¨olicher spaces [18], or synthetic differential geometry [25], to name a few. We choose to work with diffeological spaces mostly because they seem to us to provide simplest way to define and analyse the semantics of a rich class of language features.

Diffeological spaces formalise the intuition that a higher-order function is smooth if it sends smooth functions to smooth functions, meaning that we can never use it to build non-smooth first-order functions. This intuition is reminiscent of a logical relation, and it is realised by directly axiomatising smooth maps into the space, rather than treating smoothness as a derived property.

**Definition 1.** A diffeological space X = (|X| ,PX) consists of a set |X| together with, for each <sup>n</sup> <sup>∈</sup> **<sup>N</sup>** and each open subset <sup>U</sup> of **<sup>R</sup>**<sup>n</sup>, a set <sup>P</sup><sup>U</sup> <sup>X</sup> of functions U → |X| called plots, such that


We think of plots as the maps that are axiomatically deemed "smooth". We call a function <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>Y</sup> between diffeological spaces smooth if, for all plots <sup>p</sup> ∈ P<sup>U</sup> X, we have that <sup>p</sup>; <sup>f</sup> ∈ P<sup>U</sup> <sup>Y</sup> . We write **Diff**(X, Y ) for the set of smooth maps from X to Y . Smooth functions compose, and so we have a category **Diff** of diffeological spaces and smooth functions. We give some examples of such spaces.

Example 1 (Manifold diffeology). Given any open subset X of a Euclidean space **R**<sup>n</sup> (or, more generally, a smooth manifold X), we can take the set of smooth (C∞) functions <sup>U</sup> <sup>→</sup> <sup>X</sup> in the traditional sense as <sup>P</sup><sup>U</sup> <sup>X</sup>. Given another such space X , then **Diff**(X, X ) coincides precisely with the set of smooth functions X → X in the traditional sense of calculus and differential geometry.

Put differently, the categories **CartSp** of Euclidean spaces and **Man** of smooth manifolds with smooth functions form full subcategories of **Diff**.

Example 2 (Product diffeology). Given diffeological spaces (Xi)<sup>i</sup>∈<sup>I</sup> , we can equip / <sup>i</sup>∈<sup>I</sup> <sup>|</sup>Xi<sup>|</sup> with the product diffeology: <sup>P</sup> U <sup>i</sup>∈<sup>I</sup> <sup>X</sup><sup>i</sup> def = # (αi)<sup>i</sup>∈<sup>I</sup> <sup>|</sup> <sup>α</sup><sup>i</sup> ∈ P<sup>U</sup> Xi % .

Example 3 (Functional diffeology). Given diffeological spaces X, Y , we can equip **Diff**(X, Y ) with the functional diffeology <sup>P</sup><sup>U</sup> Y <sup>X</sup> def = {Λ(α) | α ∈ **Diff**(U × X, Y )}.

Examples 2 and 3 give us the categorical product and exponential objects, respectively, in **Diff**. The embeddings of **CartSp** and **Man** into **Diff** preserve products (and coproducts).

We work with the concrete semantics, where we fix C = **Diff** as the target for interpreting Cartesian types and their terms. That is, by choosing the interpretation **real**<sup>n</sup> def <sup>=</sup> **<sup>R</sup>**<sup>n</sup>, and by interpreting each op <sup>∈</sup> Op<sup>m</sup> <sup>n</sup>1,...,n<sup>k</sup> as the smooth function op : **<sup>R</sup>**<sup>n</sup><sup>1</sup> <sup>×</sup> ... <sup>×</sup> **<sup>R</sup>**<sup>n</sup><sup>k</sup> <sup>→</sup> **<sup>R</sup>**<sup>m</sup> that it is intended to represent, we obtain a unique interpretation -<sup>−</sup> : **CSyn** <sup>→</sup> **Diff**.

**Diffeological Monoids** To interpret linear types and their terms, we need a semantic setting L that is both compatible with **Diff** and enriched over the category of commutative monoids. We choose to work with commutative diffeological monoids. That is, commutative monoids internal to the category **Diff**.

**Definition 2.** A diffeological monoid X = (|X|,PX, 0X, +X) consists of a diffeological space (|X|,PX) with a monoid structure (0<sup>X</sup> ∈ |X|,(+X) : |X|×|X| → |X|), such that +<sup>X</sup> is smooth. We call a diffeological monoid commutative if the underlying monoid structure on |X| is commutative.

We write **DiffCM** for the category whose objects are commutative diffeological monoids and whose morphisms (|X|,PX, 0X, +X) → (|Y |,P<sup>Y</sup> , 0<sup>Y</sup> , +<sup>Y</sup> ) are functions f : |X|→|Y | that are both smooth (|X|,PX) → (|Y |,P<sup>Y</sup> ) and monoid homomorphisms (|X|, 0X, +X) → (|Y |, 0<sup>Y</sup> , +<sup>Y</sup> ). Given that **DiffCM** is **CMon**-enriched, finite products are biproducts.

Example 4. The real numbers **R** form a commutative diffeological monoid **R** by combining its standard diffeology with its usual commutative monoid structure (0, +). Similarly, **N** ∈ **DiffCM** by equipping **N** with (0, +) and the discrete diffeology, in which plots are locally constant functions.

Example 5. We form the (categorical) product in **DiffCM** of (Xi)<sup>i</sup>∈<sup>I</sup> by equipping / <sup>i</sup>∈<sup>I</sup> <sup>|</sup>Xi<sup>|</sup> with the product diffeology and product monoid structure.

Example 6. For a commutative diffeological monoid X, we can equip the monoid !(|X|, <sup>0</sup>X, <sup>+</sup>X) with the diffeology <sup>P</sup><sup>U</sup> !X def = #<sup>n</sup> <sup>i</sup>=1 <sup>α</sup>i; <sup>δ</sup> <sup>|</sup> <sup>n</sup> <sup>∈</sup> **<sup>N</sup>** and <sup>α</sup><sup>i</sup> ∈ P<sup>U</sup> X % .

Example 7. Given commutative diffeological monoids X and Y , we can equip the tensor product monoid (|X|, 0X, +X)⊗(|Y |, 0<sup>Y</sup> , +<sup>Y</sup> ) with the tensor product diffeology: <sup>P</sup><sup>U</sup> X⊗Y def = #<sup>n</sup> <sup>i</sup>=1 <sup>α</sup><sup>i</sup> <sup>⊗</sup> <sup>β</sup><sup>i</sup> <sup>|</sup> <sup>n</sup> <sup>∈</sup> **<sup>N</sup>** and <sup>α</sup><sup>i</sup> ∈ P<sup>U</sup> <sup>X</sup>, β<sup>i</sup> ∈ P<sup>U</sup> Y % .

In this paper, we only use the combined operation !X ⊗ Y (read: (!X) ⊗ Y ).

Example 8. Given commutative diffeological monoids X and Y , we can define a commutative diffeological monoid X Y with underlying set **DiffCM**(X, Y ), 0X-<sup>Y</sup> (x) def = 0<sup>Y</sup> , (f +X-<sup>Y</sup> g)(x) def = f(x) +<sup>Y</sup> g(x) and PU X-Y def = F <sup>α</sup> : <sup>U</sup> → |<sup>X</sup> <sup>Y</sup> | | <sup>α</sup> ∈ P<sup>U</sup> (|X|,PX)⇒(|Y |,P<sup>Y</sup> ) G .

In this paper, we will primarily be interested in X Y as a diffeological space, and we will mostly disregard its monoid structure for now.

Example 9. Given a diffeological space X and a commutative diffeological monoid Y , we can define a commutative diffeological monoid structure X ⇒ Y on <sup>X</sup> <sup>⇒</sup> (|<sup>Y</sup> <sup>|</sup>,P<sup>Y</sup> ) by using the pointwise monoid structure: 0<sup>X</sup>⇒<sup>Y</sup> (x) def = 0<sup>Y</sup> and (<sup>f</sup> <sup>+</sup><sup>X</sup>⇒<sup>Y</sup> <sup>g</sup>)(x) def = f(x) +<sup>Y</sup> g(x).

Given f ∈ **Diff**(X, Y ), we can define !f ∈ **DiffCM**(!X, !Y ) by !f( <sup>n</sup> i=1 <sup>x</sup>) = <sup>n</sup> <sup>i</sup>=1 f(x). ! is a left adjoint to the obvious forgetful functor **DiffCM** → **Diff**, while !(<sup>X</sup> <sup>×</sup> <sup>Y</sup> ) <sup>∼</sup>=!X⊗!<sup>Y</sup> and !**<sup>1</sup>** <sup>∼</sup><sup>=</sup> **<sup>N</sup>**. Seeing that (**N**, <sup>⊗</sup>, ) defines a symmetric monoidal closed structure on **DiffCM**, cognoscenti will recognise that (**Diff**, **<sup>1</sup>**, <sup>×</sup>, <sup>⇒</sup>) (**DiffCM**, **<sup>N</sup>**, **<sup>1</sup>**, <sup>×</sup>, <sup>⊗</sup>, ) is a model of intuitionistic linear logic [29]. In fact, seeing that **DiffCM** is **CMon**-enriched, the model is biadditive [17].

However, we do not need such a rich type system. For us, the following suffices. Define **DiffCM**(X), for X ∈ ob **Diff**, to have the objects of **DiffCM** and homsets **DiffCM**(X)(Y,Z) def = **Diff**(X, Y Z). Identities and composition are defined as x → (y → y) and f;**DiffCM**(X) g is defined by x → (f(x);**DiffCM** g(x)). Given f ∈ **Diff**(X, X ), we define change-of-base **DiffCM**(X ) → **DiffCM**(X) as **DiffCM**(f)(g) def = f;**Diff** g. **DiffCM**(−) defines a locally indexed category. By taking C = **Diff** and L(−) = **DiffCM**(−), we obtain a concrete instance of our abstract semantics. Indeed, we have natural isomorphisms

$$\begin{aligned} \mathbf{Diff}\_{\mathbf{CM}}(X)(!X' \otimes Y, Z) & \xrightarrow{\Phi} \mathbf{Diff}\_{\mathbf{CM}}(X \times X')(Y, Z) \\ \mathbf{Diff}\_{\mathbf{CM}}(X \times X')(Y, Z) & \xrightarrow{\Psi} \mathbf{Diff}\_{\mathbf{CM}}(X)(Y, X' \Rightarrow Z) \\ \Phi(f)(x, x')(y) & \stackrel{\text{def}}{=} f(x)(\delta(x') \otimes y) \\ \Psi(f)(x)(y)(x') & \stackrel{\text{def}}{=} f(x, x')(y) \end{aligned} \quad \Phi^{-1}(f)(x)(\sum\_{i=1}^{n} (\delta(x'\_i) \otimes y\_i)) \stackrel{\text{def}}{=} \sum\_{i=1}^{n} f(x, x'\_i)(y\_i).$$
  $\Psi(f)(x)(y)(x') \stackrel{\text{def}}{=} f(x, x')(y) \qquad \Psi^{-1}(f)(x, x')(y) \stackrel{\text{def}}{=} f(x)(y)(x').$ 

The prime motivating examples of morphisms in this category are derivatives. Recall that the derivative at x, Df(x), and transposed derivative at x, (Df) t (x), of a smooth function <sup>f</sup> : **<sup>R</sup>**<sup>n</sup> <sup>→</sup> **<sup>R</sup>**<sup>m</sup> are defined as the unique functions Df(x) : **<sup>R</sup>**<sup>n</sup> <sup>→</sup> **<sup>R</sup>**<sup>m</sup> and (Df) t (x) : **<sup>R</sup>**<sup>m</sup> <sup>→</sup> **<sup>R</sup>**<sup>n</sup> satisfying

$$Df(x)(v) = \lim\_{\delta \to 0} \frac{f(x + \delta \cdot v) - f(x)}{\delta} \qquad (Df)^t(x)(w) \bullet v = w \bullet Df(x)(v),$$

where we write v • v for the inner product <sup>n</sup> i=1(πiv)·(πiv ) of vectors v, v <sup>∈</sup> **<sup>R</sup>**<sup>n</sup>. Now, for <sup>f</sup> <sup>∈</sup> **Diff**(**R**<sup>n</sup>, **<sup>R</sup>**<sup>m</sup>), Df and (Df) <sup>t</sup> give maps in **DiffCM**(**R**<sup>n</sup>)(**R**<sup>n</sup>, **R**<sup>m</sup>) and **DiffCM**(**R**<sup>n</sup>)(**R**<sup>m</sup>, **R**<sup>n</sup>), respectively. Indeed, derivatives Df(x) of f at x are linear functions, as are transposed derivatives (Df) t (x). Both depend smoothly on x in case f is C∞-smooth. Note that the derivatives are not merely linear in the sense of preserving 0 and +. They are also multiplicative in the sense that (Df)(x)(c · v) = c ·(Df)(x)(v). We could have captured this property by working with vector spaces internal to **Diff**. However, we will not need this property to phrase or establish correctness of AD. Therefore, we restrict our attention to the more straightforward structure of commutative monoids.

Defining **real**<sup>n</sup> def <sup>=</sup> **<sup>R</sup>**<sup>n</sup> and interpreting each lop <sup>∈</sup> LOp as the smooth function lop : (**R**<sup>n</sup><sup>1</sup> <sup>×</sup> ... <sup>×</sup> **<sup>R</sup>**<sup>n</sup><sup>k</sup> ) <sup>→</sup> (**R**<sup>n</sup>- <sup>1</sup> <sup>×</sup> ... <sup>×</sup> **<sup>R</sup>**<sup>n</sup>- <sup>l</sup> ) **R**<sup>m</sup> it is intended to represent, we obtain a canonical interpretation of our target language in **DiffCM**.

# **6 Pairing Primals with Tangents/Adjoints, Categorically**

In this section, we show that any categorical model <sup>L</sup> : <sup>C</sup>op <sup>→</sup> **Cat** of our target language gives rise to two Cartesian closed categories <sup>Σ</sup>C<sup>L</sup> and <sup>Σ</sup>CLop (which we wrote −→<sup>D</sup> [L] and ←− <sup>D</sup> [L] in §2). We believe these observations of Cartesian closure are novel. Surprisingly, they are highly relevant for obtaining a principled understanding of AD on a higher-order language: the former for forward AD, and the latter for reverse AD. Applying these constructions to the syntactic category **LSyn** : **CSyn**op <sup>→</sup> **Cat** of our language, we produce a canonical definition of the AD macros, as the canonical interpretation of the λ-calculus in the Cartesian closed categories Σ**CSynLSyn** and Σ**CSynLSyn**op. In addition, when we apply this construction to the denotational semantics **DiffCM** : **Diff**op <sup>→</sup> **Cat** and invoke a categorical logical relations technique, known as subsconing, we find an elegant correctness proof of the source code transformations. The abstract construction delineated in this section is in many ways the theoretical crux of this paper.

### **6.1 Grothendieck Constructions on Strictly Indexed Categories**

Recall that for any strictly indexed category, i.e. a (strict) functor <sup>L</sup> : <sup>C</sup>op <sup>→</sup> **Cat**, we can consider its total category (or Grothendieck construction) ΣCL, which is a fibred category over C (see [23, sections A1.1.7, B1.3.1]). We can view it as a Σ-type of categories, which generalizes the Cartesian product. Concretely, its objects are pairs (A1, A2) of objects A<sup>1</sup> of C and A<sup>2</sup> of L(A1). Its morphisms (A1, A2) → (B1, B2) are pairs (f1, f2) of a morphism f<sup>1</sup> : A<sup>1</sup> → B<sup>1</sup> in C and a morphism f<sup>2</sup> : A<sup>2</sup> → L(f1)(B2) in L(A1). Identities are id(A1,A2) def = (id<sup>A</sup><sup>1</sup> , id<sup>A</sup><sup>2</sup> ) and composition is (f1, f2); (g1, g2) def = (f1; <sup>g</sup>1, f2;L(f1)(g2)). Further, given a strictly indexed category <sup>L</sup> : <sup>C</sup>op <sup>→</sup> **Cat**, we can consider its fibrewise dual category <sup>L</sup>op : <sup>C</sup>op <sup>→</sup> **Cat**, which is defined as the composition <sup>C</sup>op <sup>L</sup> −→ **Cat** op −→ **Cat**. Thus, we can apply the same construction to <sup>L</sup>op to obtain a category <sup>Σ</sup>CLop.

# **6.2 Structure of** *ΣCL* **and** *ΣCLop* **for Locally Indexed Categories**

§§6.1 applies, in particular, to the locally indexed categories of §5. In this case, we will analyze the categorical structure of <sup>Σ</sup>C<sup>L</sup> and <sup>Σ</sup>CLop. For reference, we first give a concrete description.

ΣCL is the following category:

**–** objects are pairs (A1, A2) of objects A<sup>1</sup> of C and A<sup>2</sup> of L;


<sup>Σ</sup>CLop is the following category:


We examine the categorical structure present in <sup>Σ</sup>C<sup>L</sup> and <sup>Σ</sup>CLop for categorical models L in the sense of §5 (i.e., in case L has biproducts and supports ⇒-, !(−) <sup>⊗</sup> (−)-, and Cartesian -types). We believe this is a novel observation. We will make heavy use of it to define our AD algorithms and to prove them correct.

**Proposition 2.** ΣCL has terminal object **1** = (**1**, **1**), binary product (A1, A2) × (B1, B2)=(A<sup>1</sup> × B1, A<sup>2</sup> × B2), and exponential (A1, A2) ⇒ (B1, B2) = (A<sup>1</sup> <sup>⇒</sup> (B<sup>1</sup> <sup>×</sup> (A<sup>2</sup> <sup>B</sup>2)), A<sup>1</sup> <sup>⇒</sup> <sup>B</sup>2).

Proof. We have (natural) bijections

$$\mathcal{L}\_{\mathcal{C}}\mathcal{L}((A\_1, A\_2), (\mathbb{1}, \mathbb{1})) = \mathcal{C}(A\_1, \mathbb{1}) \times \mathcal{L}(A\_1)(A\_2, \mathbb{1}) \cong \mathbb{1} \times \mathbb{1} \cong \mathbb{1} \qquad \qquad \left\{ \begin{array}{c} \mathbb{1} \text{ terminal in } \mathcal{C} \text{ and } \mathcal{L}(A\_1) \ \vdots \end{array} \right.$$

$$\begin{aligned} &\Sigma\_{\mathcal{C}}\mathcal{L}((A\_1, A\_2), (B\_1 \times C\_1, B\_2 \times C\_2)) = \mathcal{C}(A\_1, B\_1 \times C\_1) \times \mathcal{L}(A\_1)(A\_2, B\_2 \times C\_2) \\ &\cong \mathcal{C}(A\_1, B\_1) \times \mathcal{C}(A\_1, C\_1) \times \mathcal{L}(A\_1)(A\_2, B\_2) \times \mathcal{L}(A\_1)(A\_2, C\_2) \\ &\cong \Sigma\_{\mathcal{C}}\mathcal{L}((A\_1, A\_2), (B\_1, B\_2)) \times \Sigma\_{\mathcal{C}}\mathcal{L}((A\_1, A\_2), (C\_1, C\_2)) \end{aligned} \qquad \left\{ \begin{array}{l} \times \text{product in } \mathcal{C} \text{ and } \mathcal{L}(A\_1) \\ \end{array} \right\}$$

ΣCL((A1, A2) × (B1, B2),(C1, C2)) = ΣCL((A<sup>1</sup> × B1, A<sup>2</sup> × B2),(C1, C2)) = C(A<sup>1</sup> × B1, C1) × L(A<sup>1</sup> × B1)(A<sup>2</sup> × B2, C2) ∼= C(A<sup>1</sup> × B1, C1) × L(A<sup>1</sup> × B1)(A2, C2) × L(A<sup>1</sup> × B1)(B2, C2) { <sup>×</sup> coproducts in <sup>L</sup>(A<sup>1</sup> <sup>×</sup> <sup>B</sup>1) } ∼= C(A<sup>1</sup> × B1, C1) × L(A1)(A2, B<sup>1</sup> ⇒ C2) × L(A<sup>1</sup> × B1)(B2, C2) { <sup>⇒</sup>-types in <sup>L</sup> } <sup>∼</sup><sup>=</sup> <sup>C</sup>(A<sup>1</sup> <sup>×</sup> <sup>B</sup>1, C1) × L(A1)(A2, B<sup>1</sup> <sup>⇒</sup> <sup>C</sup>2) × C(A<sup>1</sup> <sup>×</sup> <sup>B</sup>1, B<sup>2</sup> <sup>C</sup>2) { Cartesian --types } <sup>∼</sup><sup>=</sup> <sup>C</sup>(A<sup>1</sup> <sup>×</sup> <sup>B</sup>1, C<sup>1</sup> <sup>×</sup> (B<sup>2</sup> <sup>C</sup>2)) × L(A1)(A2, B<sup>1</sup> <sup>⇒</sup> <sup>C</sup>2) { <sup>×</sup> is product in <sup>C</sup> } <sup>∼</sup><sup>=</sup> <sup>C</sup>(A1, B<sup>1</sup> <sup>⇒</sup> (C<sup>1</sup> <sup>×</sup> (B<sup>2</sup> <sup>C</sup>2))) × L(A1)(A2, B<sup>1</sup> <sup>⇒</sup> <sup>C</sup>2) { <sup>⇒</sup> is exponential in <sup>C</sup> } <sup>=</sup> <sup>Σ</sup>CL((A1, A2),(B<sup>1</sup> <sup>⇒</sup> (C<sup>1</sup> <sup>×</sup> (B<sup>2</sup> <sup>C</sup>2)), B<sup>1</sup> <sup>⇒</sup> <sup>C</sup>2)) = ΣCL((A1, A2),(B1, B2) ⇒ (C1, C2)).

We observe that we need L to have biproducts (equivalently: to be **CMon** enriched) in order to show Cartesian closure. Further, we need linear ⇒-types and Cartesian -types to construct exponentials.

**Proposition 3.** <sup>Σ</sup>CLop has terminal object **<sup>1</sup>** = (**1**, **<sup>1</sup>**), binary product (A1, A2)<sup>×</sup> (B1, B2)=(A<sup>1</sup> × B1, A<sup>2</sup> × B2), and exponential (A1, A2) ⇒ (B1, B2) = (A<sup>1</sup> <sup>⇒</sup> (B<sup>1</sup> <sup>×</sup> (B<sup>2</sup> <sup>A</sup>2)), !A<sup>1</sup> <sup>⊗</sup> <sup>B</sup>2).

Proof. We have (natural) bijections

<sup>Σ</sup>CLop((A1, A2),(**1**, **<sup>1</sup>**)) = <sup>C</sup>(A1, **<sup>1</sup>**) × L(A1)(**1**, A2) <sup>∼</sup><sup>=</sup> **<sup>1</sup>** <sup>×</sup> **<sup>1</sup>** <sup>∼</sup><sup>=</sup> **<sup>1</sup>** { **<sup>1</sup>** terminal in <sup>C</sup>, initial in <sup>L</sup>(A1) } <sup>Σ</sup>CLop((A1, A2),(B<sup>1</sup> <sup>×</sup> <sup>C</sup>1, B<sup>2</sup> <sup>×</sup> <sup>C</sup>2)) = <sup>C</sup>(A1, B<sup>1</sup> <sup>×</sup> <sup>C</sup>1) × L(A1)(B<sup>2</sup> <sup>×</sup> <sup>C</sup>2, A2) ∼= C(A1, B1) × C(A1, C1) × L(A1)(B2, A2) × L(A1)(C2, A2) { <sup>×</sup> product in <sup>C</sup>, coproduct in <sup>L</sup>(A1) } <sup>=</sup> <sup>Σ</sup>CLop((A1, A2),(B1, B2)) <sup>×</sup> <sup>Σ</sup>CLop((A1, A2),(C1, C2)) <sup>Σ</sup>CLop((A1, A2) <sup>×</sup> (B1, B2),(C1, C2)) = <sup>Σ</sup>CLop((A<sup>1</sup> <sup>×</sup> <sup>B</sup>1, A<sup>2</sup> <sup>×</sup> <sup>B</sup>2),(C1, C2)) = C(A<sup>1</sup> × B1, C1) × L(A<sup>1</sup> × B1)(C2, A<sup>2</sup> × B2) ∼= C(A<sup>1</sup> × B1, C1) × L(A<sup>1</sup> × B1)(C2, A2) × L(A<sup>1</sup> × B1)(C2, B2) { <sup>×</sup> is product in <sup>L</sup>(A<sup>1</sup> <sup>×</sup> <sup>B</sup>1) } <sup>∼</sup><sup>=</sup> <sup>C</sup>(A<sup>1</sup> <sup>×</sup> <sup>B</sup>1, C1) × C(A<sup>1</sup> <sup>×</sup> <sup>B</sup>1, C<sup>2</sup> <sup>B</sup>2) × L(A<sup>1</sup> <sup>×</sup> <sup>B</sup>1)(C2, A2) { Cartesian --types } <sup>∼</sup><sup>=</sup> <sup>C</sup>(A<sup>1</sup> <sup>×</sup> <sup>B</sup>1, C<sup>1</sup> <sup>×</sup> (C<sup>2</sup> <sup>B</sup>2)) × L(A<sup>1</sup> <sup>×</sup> <sup>B</sup>1)(C2, A2) { <sup>×</sup> is product in <sup>C</sup> } <sup>∼</sup><sup>=</sup> <sup>C</sup>(A1, B<sup>1</sup> <sup>⇒</sup> (C<sup>1</sup> <sup>×</sup> (C<sup>2</sup> <sup>B</sup>2))) × L(A<sup>1</sup> <sup>×</sup> <sup>B</sup>1)(C2, A2) { <sup>⇒</sup> is exponential in <sup>C</sup> } <sup>∼</sup><sup>=</sup> <sup>C</sup>(A1, B<sup>1</sup> <sup>⇒</sup> (C<sup>1</sup> <sup>×</sup> (C<sup>2</sup> <sup>B</sup>2))) × L(A1)(!B<sup>1</sup> <sup>⊗</sup> <sup>C</sup>2, A2) { !(−) <sup>⊗</sup> (−)-types } <sup>=</sup> <sup>Σ</sup>CLop((A1, A2),(B<sup>1</sup> <sup>⇒</sup> (C<sup>1</sup> <sup>×</sup> (C<sup>2</sup> <sup>B</sup>2)), !B<sup>1</sup> <sup>⊗</sup> <sup>C</sup>2)) <sup>=</sup> <sup>Σ</sup>CLop((A1, A2),(B1, B2) <sup>⇒</sup> (C1, C2)).

Observe that we need the biproduct structure of L to construct finite products in <sup>Σ</sup>CLop. Further, we need Cartesian -types and !(−) <sup>⊗</sup> (−)-types, but not biproducts, to construct exponentials.

# **7 Novel AD Algorithms as Source-Code Transformations**

As <sup>Σ</sup>**CSynLSyn** and <sup>Σ</sup>**CSynLSyn**op are both Cartesian closed categories by §6, the universal property of **Syn** yields unique structure-preserving macros, −→D (−) : **Syn** <sup>→</sup> <sup>Σ</sup>**CSynLSyn** (forward AD) and ←−D (−) : **Syn** <sup>→</sup> <sup>Σ</sup>**CSynLSyn**op (reverse AD), once we fix a compatible definition for the macros on **real**<sup>n</sup> and basic operations op. By definition of equality in **Syn**, Σ**CSynLSyn** and Σ**CSynLSyn**op, these macros automatically respect equational reasoning principles, in the sense that t βη = s implies that −→D (t) βη+ = −→D (s) and ←−D (t) βη+ = ←−D (s).

We need to choose suitable terms Dop(x; y) and Dop<sup>t</sup> (x; y) to represent the forward- and reverse-mode derivatives of the basic operations op∈Op<sup>m</sup> <sup>n</sup>1,...,n<sup>k</sup> . For example, for elementwise multiplication (∗) <sup>∈</sup> Op<sup>n</sup> n,n, we can define D(∗)(x; y) = (**fst** x) ∗ (**snd** y)+(**snd** x) ∗ (**fst** y) and D(∗) t (x; y) = (**snd** x) ∗ y,(**fst** x) ∗ y, where we use (linear) elementwise multiplication (∗) <sup>∈</sup> LOp<sup>n</sup> <sup>n</sup>;<sup>n</sup>. We represent derivatives as linear functions. This representation allows for efficient Jacobianvector/adjoint product implementations, which avoid first calculating a full Jacobian and next taking a product. Such implementations are known to be important to achieve performant AD systems.

$$\begin{array}{llll}\overline{\mathcal{D}}(\mathsf{real}^{n})\_{1} \stackrel{\mathrm{def}}{=} \mathsf{real}^{n} & \overline{\mathcal{D}}(\mathsf{real}^{n})\_{2} \stackrel{\mathrm{def}}{=} \mathsf{real}^{n} & \overline{\mathcal{D}}(\mathsf{real}^{n})\_{1} \stackrel{\mathrm{def}}{=} \mathsf{real}^{n} & \overline{\mathcal{D}}(\mathsf{real}^{n})\_{2} \stackrel{\mathrm{def}}{=} \mathsf{real}^{n} \\\\ \overline{\mathcal{D}}(\mathsf{op})\_{1} \stackrel{\mathrm{def}}{=} \mathsf{op} & \overline{\mathcal{D}}(\mathsf{op})\_{2} \stackrel{\mathrm{def}}{=} x : \mathsf{real}^{n\_{1}} \* . \mathsf{real}^{n\_{2}} ; y : \underline{\mathsf{real}^{n\_{1}}} \* . \mathsf{real}^{n\_{2}} \vdash \mathsf{D} \mathsf{op}(x; y) : \underline{\mathsf{real}^{m}} \\\\ \overline{\mathcal{D}}(\mathsf{op})\_{1} \stackrel{\mathrm{def}}{=} \mathsf{op} & \overline{\mathcal{D}}(\mathsf{op})\_{2} \stackrel{\mathrm{def}}{=} x : \mathsf{real}^{n\_{1}} \* . \mathsf{real}^{n\_{k}} ; y : \underline{\mathsf{real}^{m}} \vdash \mathsf{D} \mathsf{op}^{t}(x; y) : \underline{\mathsf{real}^{n\_{1}}} \* . \mathsf{real}^{n\_{1}} \vdash . \mathsf{real}^{n\_{2}} \right] \end{array}$$

For the AD transformations to be correct, it is important that these derivatives of language primitives are implemented correctly in the sense that

$$\mathbb{E}\left[x; y \vdash D\mathsf{op}(x; y)\right] = D\mathsf{[op}] \qquad \left[x; y \vdash D\mathsf{op}^t(x; y)\right] = D\left[\mathsf{op}\right]^t.$$

In practice, AD library developers tend to assume the subtle task of correctly implementing such derivatives Dop(x; y) and Dop<sup>t</sup> (x; y) whenever a new primitive operation op is added to the library.

The extension of the AD macros −→D and ←−D to the full source language are now canonically determined, as the unique Cartesian closed functors that extend the previous definitions, following the categorical structure described in §6. Because of the counter-intuitive nature of the Cartesian closed structures on Σ**CSynLSyn** and Σ**CSynLSyn**op, we list the full macros explicitly in [36, Appx. A].

# **8 Proving Reverse and Forward AD Semantically Correct**

In this section, we will show that the source code transformations described in §7 correctly implement mathematical derivatives. We make correctness precise as the statement that for programs x : τ t : σ between first-order types τ and σ, i.e. types not containing any function type constructors, we have that - −→D (t)<sup>2</sup> <sup>=</sup> D<sup>t</sup> and - ←−D (t)<sup>2</sup> = (Dt) t , where -<sup>−</sup> is the semantics of §5. The proof mainly consists of logical relations arguments over the semantics in Σ**DiffDiffCM** and Σ**DiffDiffCM**op. This logical relations proof can be phrased in elementary terms, but the resulting argument is technical and would be hard to discover. Instead, we prefer to phrase it in terms of a categorical subsconing construction, a more abstract and elegant perspective on logical relations. We discovered the proof by taking this categorical perspective, and, while we have verified the elementary argument (see [36, Appx. D]), we would not otherwise have come up with it.

### **8.1 Preliminaries**

**Subsconing** Logical relations arguments provide a powerful proof technique for demonstrating properties of typed programs. The arguments proceed by induction on the structure of types. Here, we briefly review the basics of categorical logical relations arguments, or subsconing constructions. We restrict to the level of generality that we need here, but we would like to point out that the theory applies much more generally.

Consider a Cartesian closed category (C, **1**, ×, ⇒). Suppose that we are given a functor F : C → **Set** to the category **Set** of sets and functions which preserves finite products in the sense that F(**1**) ∼= **1** and F(C × C ) ∼= F(C) × F(C ). Then, we can form the subscone of F, or category of logical relations over F, which is Cartesian closed, with a faithful Cartesian closed functor π<sup>1</sup> to C which forgets about the predicates [24]:


**–** identities and composition are as in C;

**–** (**1**, F**1**) is the terminal object, and products and exponentials are given by (C, P)×(C , P )=(C×C , {α ∈ F(C × C ) | F(π1)(α) ∈ P, F(π2)(α) ∈ P }) (C, P) ⇒ (C , P )=(C ⇒ C , {F(π1)(γ) | γ ∈ F((C ⇒ C ) × C) s.t. F(π2)(γ) ∈ P implies F(ev)(γ) ∈ P }).

In typical applications, C can be the syntactic category of a language (like **Syn**), the codomain of a denotational semantics -<sup>−</sup> (like **Diff**), or a product of the above, if we want to consider n-ary logical relations. Typically, F tends to be a hom-functor (which always preserves products), like C(**1**, −) or C(C0, −), for some important object C0. When applied to the syntactic category **Syn** and F = **Syn**(**1**, −), the formulae for products and exponentials in the subscone clearly reproduce the usual recipes in traditional, syntactic logical relations arguments. As such, subsconing generalises standard logical relations methods.

### **8.2 Subsconing for Correctness of AD**

We will apply the subsconing construction above to

$$\begin{array}{l} \mathcal{C} = \textbf{Diff} \times \Sigma\_{\textbf{Diff}} \textbf{Diff}\_{\textbf{CM}} & F = \textbf{Diff} \times \Sigma\_{\textbf{Diff}} \textbf{Diff}\_{\textbf{CM}}((\mathbb{R}, (\mathbb{R}, \underline{\mathbb{R}})), -) \quad \text{(forward AD)}\\ \mathcal{C} = \textbf{Diff} \times \Sigma\_{\textbf{Diff}} \textbf{Diff}\_{\textbf{CM}}{}^{op} F = \textbf{Diff} \times \Sigma\_{\textbf{Diff}} \textbf{Diff}\_{\textbf{CM}}{}^{op}((\mathbb{R}, (\mathbb{R}, \underline{\mathbb{R}})), -) \text{ (reverse AD)}, \end{array}$$

where we note that **Diff**, Σ**DiffDiffCM**, and Σ**DiffDiffCM**op are Cartesian closed (given the arguments of §5 and §6) and that the product of Cartesian closed categories is again Cartesian closed. Let us write −−−−−→ **SScone** and ←−−−−− **SScone**, respectively, for the resulting categories of logical relations.

Seeing that −−−−−→ **SScone** and ←−−−−− **SScone** are Cartesian closed, we obtain unique Cartesian closed functors −<sup>f</sup> : **Syn** <sup>→</sup> −−−−−→ **SScone** and −<sup>r</sup> : **Syn** <sup>→</sup> ←−−−−− **SScone** once we fix an interpretation of **real**<sup>n</sup> and all operations op. We write P<sup>f</sup> <sup>τ</sup> and P<sup>r</sup> τ , respectively, for the relations <sup>π</sup><sup>2</sup><sup>τ</sup> <sup>f</sup> and <sup>π</sup><sup>2</sup><sup>τ</sup> <sup>r</sup>. Let us interpret

$$\begin{split} \{ \mathsf{real}^{n} \} ^{f} & \stackrel{\mathsf{def}}{=} \left( \left( \langle \mathbb{R}^{n}, \left( \mathbb{R}^{n}, \underline{\mathbb{R}}^{n} \right) \rangle, \left\{ \langle f, (g, h) \rangle \mid f = g \text{ and } h = Df \right\} \right) \right) \\ \{ \mathsf{real}^{n} \} ^{r} & \stackrel{\mathsf{def}}{=} \left( \left( \langle \mathbb{R}^{n}, \left( \mathbb{R}^{n}, \underline{\mathbb{R}}^{n} \right) \rangle, \left\{ \langle f, (g, h) \rangle \mid f = g \text{ and } h = \left( Df \right)^{t} \right\} \right) \right) \\ \{ \mathsf{op} \} ^{f} & \stackrel{\mathsf{def}}{=} \left( \left[ \mathsf{op} \right], \left( \left[ \overline{\mathcal{D}} \langle \mathsf{op} \rangle\_{1} \right], \left[ \overline{\mathcal{D}} \langle \mathsf{op} \rangle\_{2} \right] \right) \right) \qquad \langle \mathsf{op} \rangle^{r} \stackrel{\mathsf{def}}{=} \left( \left[ \mathsf{op} \right], \left( \left[ \overline{\mathcal{D}} \langle \mathsf{op} \rangle\_{1} \right], \left[ \overline{\mathcal{D}} \langle \mathsf{op} \rangle\_{2} \right] \right) \right) \}, \end{split}$$

where we write Df for the semantic derivative of f (see §5). We need to verify, respectively, that (op,(- −→D (op)<sup>1</sup>, - −→D (op)<sup>2</sup>)) and (op,(- ←−D (op)<sup>1</sup>, - ←−D (op)<sup>2</sup>)) respect the logical relations P<sup>f</sup> and P<sup>r</sup>. This respecting of relations follows immediately from the chain rule for multivariate differentiation, as long as we have implemented our derivatives correctly for the basic operations op:

$$\mathbb{E}\left[x; y \vdash D\mathsf{op}(x; y)\right] = D\mathsf{[op}] \qquad\qquad\text{and}\qquad\qquad \left[x; y \vdash \left(D\mathsf{op}\right)^{t}(x; y)\right] = \left(D\left[\mathsf{op}\right]\right)^{t}.$$

Writing **real**<sup>n</sup>1,..,n<sup>k</sup> def <sup>=</sup> **real**<sup>n</sup><sup>1</sup> *<sup>∗</sup>*..*∗***real**<sup>n</sup><sup>k</sup> and **<sup>R</sup>**<sup>n</sup>1,..,n<sup>k</sup> def <sup>=</sup> **<sup>R</sup>**<sup>n</sup><sup>1</sup>×..×**R**<sup>n</sup><sup>k</sup> , we compute

$$\begin{aligned} \{\mathsf{real}^{n\_1,\ldots,n\_k}\}^f &= ( (\mathbb{R}^{n\_1,\ldots,n\_k}, (\mathbb{R}^{n\_1,\ldots,n\_k}, \underline{\mathbb{R}}^{n\_1,\ldots,n\_k})), \{(f,(g,h)) \mid f = g, h = Df\} )\\ \{\mathsf{real}^{n\_1,\ldots,n\_k}\}^r &= ( (\mathbb{R}^{n\_1,\ldots,n\_k}, (\mathbb{R}^{n\_1,\ldots,n\_k}, \underline{\mathbb{R}}^{n\_1,\ldots,n\_k})), \{(f,(g,h)) \mid f = g, h = (Df)^t \} )\end{aligned}$$

since derivatives of tuple-valued functions are computed component-wise. (In fact, the corresponding facts hold more generally for any first-order type, as an iterated product of **real**<sup>n</sup>.) Suppose that (f,(g, h)) <sup>∈</sup> <sup>P</sup><sup>f</sup> **real**n1,..,nk , i.e. g = f and h = Df. Then, using the chain rule in the last step, we have

$$\begin{split} &(f,(g,h)); ([\![\mathsf{op}], ([\![\overline{\mathcal{D}}(\mathsf{op})] \, \_{1}], [\overline{\mathcal{D}}(\mathsf{op}) \, \_{2}])) = (f,(f,Df)); ([\![\mathsf{op}], ([\mathsf{op}], [x \mathrel{\mathsf{op}}, D\mathsf{op}(x \mathrel{\mathsf{op}})]))) \\ &= (f,(f,Df)); ([\![\mathsf{op}], ([\mathsf{op}], D[\mathsf{op}]))) = (f; [\mathsf{op}], (f; [\mathsf{op}], x \mathrel{\mathsf{op}} \, r \mapsto D[\mathsf{op}](f(x))(Df(x)(r))))) \\ &= (f; [\mathsf{op}], (f; [\mathsf{op}], D(f; [\mathsf{op}]))) \in P^{f}\_{\mathsf{real}^{m}}. \end{split}$$

Similarly, if (f,(g, h)) <sup>∈</sup> <sup>P</sup><sup>r</sup> **real**n1,..,nk , then by the chain rule and linear algebra (f,(g, h)); (op,(- ←−D (op)<sup>1</sup>, - ←−D (op)<sup>2</sup>)) = (f,(f,(Df) t )); (op,(op, x; y (Dop) t (x; <sup>y</sup>))) = (f,(f, Df<sup>t</sup> )); (op,(op,(Dop) t )) = (f; op,(f; op, x → <sup>v</sup> → Df<sup>t</sup> (x)(Dop t (f(x))(v)))) = (f; op,(f; op, x → <sup>v</sup> → (Df(x); <sup>D</sup>op(f(x)))<sup>t</sup> (v))) = (f; op,(f; op,(D(f; op))<sup>t</sup> )) <sup>∈</sup> <sup>P</sup><sup>r</sup> **real**<sup>m</sup>.

Consequently, we obtain our Cartesian closed functors −<sup>f</sup> and −<sup>r</sup>.

Further, observe that <sup>Σ</sup>−-<sup>−</sup>(t1, t2) def = (<sup>t</sup><sup>1</sup>, <sup>t</sup><sup>2</sup>) defines a Cartesian closed functor <sup>Σ</sup>−-<sup>−</sup> : <sup>Σ</sup>**CSynLSyn** <sup>→</sup> <sup>Σ</sup>**DiffDiffCM**. Similarly, we get a Cartesian closed functor <sup>Σ</sup>−-<sup>−</sup>op : <sup>Σ</sup>**CSynLSyn**op <sup>→</sup> <sup>Σ</sup>**DiffDiffCM**op. As a consequence, the two squares below commute.

**Syn Syn** <sup>×</sup> <sup>Σ</sup>**CSynLSyn Syn Syn** <sup>×</sup> <sup>Σ</sup>**CSynLSyn**op −−−−−→ **SScone Diff** <sup>×</sup> <sup>Σ</sup>**DiffDiffCM** ←−−−−− **SScone Diff** <sup>×</sup> <sup>Σ</sup>**DiffDiffCM**op. (id, −→D ) −<sup>f</sup> −×Σ-<sup>−</sup>− (id, ←−D ) −<sup>r</sup> −×Σ-−−op

π1 π1 Indeed, going around the squares in both directions define Cartesian closed functors that agree on their action on **real**<sup>n</sup> and all operations op. So, by the universal property of **Syn**, they must coincide. In particular, (<sup>t</sup>,(- −→D (t)<sup>1</sup>, - −→D (t)<sup>2</sup>)) is a morphism in −−−−−→ **SScone** and therefore respects the logical relations P<sup>f</sup> for any welltyped term <sup>t</sup> of the source language of §3. Similarly, (<sup>t</sup>,(- ←−D (t)<sup>1</sup>, - ←−D (t)<sup>2</sup>)) is a morphism in ←−−−−− **SScone** and therefore respects the logical relations P<sup>r</sup>.

Most of the work is now in place to show correctness of AD. We finish the proof below. To ease notation, we work with terms in a context with a single type. Doing so is not a restriction as our language has products, and the theorem holds for arbitrary terms between first-order types.

**Theorem 1 (Correctness of AD).** For programs x : τ t : σ between firstorder types τ and σ,


Proof (sketch, see [36, Appx. B] for details). To show that - −→D (t)<sup>1</sup>(x) = <sup>t</sup>(x) and - −→D (t)<sup>2</sup>(x)(v) = <sup>D</sup><sup>t</sup>(x)(v), we choose a smooth curve <sup>γ</sup> : **<sup>R</sup>** <sup>→</sup> <sup>τ</sup> such that γ(0) = 0 and Dγ(0)(1) = v and use that t respects the logical relations P<sup>f</sup> .

To show that - ←−D (t)<sup>1</sup>(x) = <sup>t</sup>(x) and - ←−D (t)<sup>2</sup>(x)(v) = <sup>D</sup><sup>t</sup>(x) t (v), we choose smooth curves <sup>γ</sup><sup>i</sup> : **<sup>R</sup>** <sup>→</sup> <sup>τ</sup> such that <sup>γ</sup>i(0) = <sup>x</sup> and <sup>γ</sup>i(0)(1) = <sup>e</sup>i, for all standard basis vectors <sup>e</sup><sup>i</sup> of - ←−D (<sup>τ</sup> )<sup>2</sup> <sup>∼</sup><sup>=</sup> **<sup>R</sup>**<sup>N</sup> . It now follows that - ←−D (t)<sup>1</sup>(x) = <sup>t</sup>(x) and <sup>e</sup><sup>i</sup> • - ←−D (t)<sup>2</sup>(x)(v) = <sup>e</sup><sup>i</sup> •D<sup>t</sup>(x) t (v) as t respects the logical relations P<sup>r</sup>.


**Fig. 5.** Typing rules for the applied target language, to extend the source language.

# **9 Practical Relevance and Implementation**

Popular functional languages, such as Haskell and O'Caml, do not natively support linear types. As such, the transformations described in this paper may seem hard to implement. However, as we summarize in this section (and detail in [36, Appx. C]), we can easily implement the limited linear types needed for the transformations as abstract data types by using merely a basic module system.

Specifically, we consider, as an alternative, applied target language for our transformations, the extension of the source language of §3 with the terms and types of Fig. 5. We can define a faithful translation (−)† from our linear target language of §4 to this language: define (!<sup>τ</sup> <sup>⊗</sup> <sup>σ</sup>)† def = **Tens**(τ †, σ†,), (τ σ)† def = **LFun**(τ †, σ†), (**real**<sup>n</sup>)† def <sup>=</sup> **real**<sup>n</sup> and extend (−)† structurally recursively, letting it preserve all other type formers. We then translate (x<sup>1</sup> : τ,...,x<sup>n</sup> : τ ; y : σ t : ρ)† def = x<sup>1</sup> : τ †,...,x<sup>n</sup> : τ † t † : (<sup>σ</sup> <sup>ρ</sup>)† and (x<sup>1</sup> : τ,...,x<sup>n</sup> : <sup>τ</sup> <sup>t</sup> : <sup>σ</sup>)† def = x<sup>1</sup> : τ †,...,x<sup>n</sup> : τ † t † : σ†. We believe an interested reader can fill in the details. This exhibits the linear target language as a sublanguage of the applied target language. The applied target language merely collapses the distinction between linear and Cartesian types and it adds the constructs **lapp**(t, s) for practical usability and to ensure that our adequacy result below is meaningful.

We can implement the API of Fig. 5 as a module that defines the abstract types **LFun**(τ,σ), under the hood implemented as a plain function type τ → σ, and **Tens**(τ,σ), which is implemented as lists of pairs **List**(τ*∗*σ). Then, the required terms of Fig. 5 can be implemented as follows, using standard idiom [ ], t :: s, **fold** op **over** x **in** t**from** acc = init for empty lists, cons-ing, and folding:

0**<sup>1</sup>** = t +**<sup>1</sup>** s = 0<sup>τ</sup>*∗*<sup>σ</sup> = 0<sup>τ</sup> , 0σ t +<sup>τ</sup>*∗*<sup>σ</sup> s = **fst** t +<sup>τ</sup> **fst** s, **snd** t +<sup>σ</sup> **snd** s <sup>0</sup><sup>τ</sup>→<sup>σ</sup> <sup>=</sup> <sup>λ</sup> .0<sup>σ</sup> <sup>t</sup> <sup>+</sup><sup>τ</sup>→<sup>σ</sup> <sup>s</sup> <sup>=</sup> λx.t x <sup>+</sup><sup>σ</sup> s x <sup>0</sup>**LFun**(τ,σ) <sup>=</sup> <sup>λ</sup> .0<sup>σ</sup> <sup>t</sup> <sup>+</sup>**LFun**(τ,σ) <sup>s</sup> <sup>=</sup> λx.t x <sup>+</sup><sup>σ</sup> s x 0**Tens**(τ,σ) def = [] <sup>t</sup> <sup>+</sup>**Tens**(τ,σ) <sup>s</sup> def = **fold** x :: acc **over** x **in** t**from** acc = s **lid** def = λx.x t;; s def = λx.s (t x) **lapp**(t, s) def = t s **lswap** t def = λx.λy.t y x **leval**<sup>t</sup> def = λx.x t

{(t, <sup>−</sup>)} def <sup>=</sup> λx.t, x :: [ ] **lcur**−<sup>1</sup><sup>t</sup> def = λz.**fold** t(**fst** x) (**snd** x) + acc **over** x **in** z **from** acc = 0 **lfst** def = λx.**fst** x **lsnd** def = λx.**snd** x **lpair**(t, s) def = λx.t x, s x

Our denotational semantics extends to this applied target language and is adequate with respect to the operational semantics induced by the suggested implementation. Further, our correctness proofs of the induced source-code translations also transfer to this applied setting, and they can be usefully phrased as manual, extensible logical relations proofs. As an application, we can extend our source language with higher-order primitives, like **map** ∈ **Syn**((**real** → **real**)*∗***real**<sup>n</sup>, **real**<sup>n</sup>) to "map" functions over the black-box arrays **real**<sup>n</sup>. Then, our proofs extend to show that their correct forward and reverse derivatives are

−→D (**map**)1(f,v) def <sup>=</sup> **map**(f;**fst** , v) −→D (**map**)2(f,v)(g, w) def = **map** g v + **zipWith**(f; **snd** ) v w ←−D (**map**)1(f,v) def <sup>=</sup> **map**(f;**fst** , v) ←−D (**map**)2(f,v)(w) def = **zip** v w, **zipWith** (f; **snd** ) v w,

where we use the standard functional programming idiom **zip** and **zipWith**. Here, we can operate directly on the internal representations of **LFun**(τ,σ) and **Tens**(τ,σ), as the definitions of derivatives of primitives live inside our module.

# **10 Related and Future Work**

**Related work** This work is closely related to [20], which introduced a similar semantic correctness proof for a version of forward-mode AD, using a subsconing construction. A major difference is that this paper also phrases and proves correctness of reverse-mode AD on a λ-calculus and relates reverse-mode to forward-mode AD. Using a syntactic logical relations proof instead, [5] also proves correctness of forward-mode AD. Again, it does not address reverse AD.

[11] proposes a similar construction to that of §6, and it relates it to the differential λ-calculus. This paper develops sophisticated axiomatics for semantic reverse differentiation. However, it neither relates the semantics to a sourcecode transformation, nor discusses differentiation of higher-order functions. Our construction of differentiation with a (biadditive) linear target language might remind the reader of differential linear logic [15]. In differential linear logic, (forward) differentiation is a first-class operation in a (biadditive) linear language. By contrast, in our treatment, differentiation is a meta-operation.

Importantly, [16] describes and implements what are essentially our sourcecode transformations, though they were restricted to first-order functions and scalars. [37] sketches an extension of the reverse-mode transformation to higherorder functions in essentially the same way as proposed in this paper. It does not motivate or derive the algorithm or show its correctness. Nevertheless, this short paper discusses important practical considerations for implementing the algorithm, and it discusses a dependently typed variant of the algorithm.

Next, there are various lines of work relating to correctness of reverse-mode AD that we consider less similar to our work. For example, [28] define and prove correct a formulation of reverse-mode AD on a higher-order language that depends on a non-standard operational semantics, essentially a form of symbolic execution. [2] does something similar for reverse-mode AD on a first-order language extended with conditionals and iteration. [8] defines an AD algorithm in a simply typed λ-calculus with linear negation (essentially, the continuation-based AD of [20]) and proves it correct using operational techniques. Further, they show that this algorithm corresponds to reverse-mode AD under a non-standard operational semantics (with the "linear factoring rule"). These formulations of reverse-mode AD all depend on non-standard run-times and fall into the category of "define-by-run" formulations of reverse-mode AD. Meanwhile, we are concerned with "define-then-run" formulations: source-code transformations producing differentiated code at compile-time, which can then be optimized during compilation with existing compiler tool-chains.

Finally, there is a long history of work on reverse-mode AD, though almost none of it applies the technique to higher-order functions. A notable exception is [31], which gives an impressive source-code transformation implementation of reverse AD in Scheme. While very efficient, this implementation crucially uses mutation. Moreover, the transformation is complex and correctness is not considered. More recently, [38] describes a much simpler implementation of a reverse AD code transformation, again very performant. However, the transformation is quite different from the one considered in this paper as it relies on a combination of delimited continuations and mutable state. Correctness is not considered, perhaps because of the semantic complexities introduced by impurity.

Our work adds to the existing literature by presenting (to our knowledge) the first principled and pure define-then-run reverse AD algorithm for a higherorder language, by arguing its practical applicability, and by proving semantic correctness of the algorithm.

**Future work** We plan to build a practical, verified AD library based on the methods introduced in this paper. This will involve calculating the derivative of many first- and higher-order primitives according to our method.

Next, we aim to extend our method to other expressive language features. We conjecture that the method extends to source languages with variant and inductive types as long as one makes the target language a linear dependent type theory [10,34]. Indeed, the dimension of (co)tangent spaces to a disjoint union of spaces depends on the choice of base point. The required colimits to interpret such types in <sup>Σ</sup>C<sup>L</sup> and <sup>Σ</sup>CLop should exist by standard results about arrow and container categories [3]. We are hopeful that the method can also be made to apply to source languages with general recursion by calculating the derivative of fixpoint combinators similarly to our calculation for **map**. The correctness proof will then rely on a domain theoretic generalisation of our techniques [35].

**Acknowledgements** This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 895827. We thank Michael Betancourt, Philip de Bruin, Bob Carpenter, Mathieu Huot, Danny de Jong, Ohad Kammar, Gabriele Keller, Pieter Knops, Curtis Chin Jen Sem, Amir Shaikhha, Tom Smeding, and Sam Staton for helpful discussions about automatic differentiation.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Sound and Complete Concolic Testing for Higher-order Functions

Shu-Hung You[-], Robert Bruce Findler, and Christos Dimoulas

Northwestern University, Evanston, IL, USA shu-hung.you@eecs.northwestern.edu, robby@cs.northwestern.edu, chrdimo@northwestern.edu

Abstract. Higher-order functions have become a staple of modern programming languages. However, such values stymie concolic testers, as the SMT solvers at their hearts are inherently first-order.

This paper lays a formal foundations for concolic testing higher-order functional programs. Three ideas enable our results: (i) our tester considers only program inputs in a canonical form; (ii) it collects novel constraints from the evaluation of the canonical inputs to search the space of inputs with partial help from an SMT solver and (iii) it collects constraints from canonical inputs even when they are arguments to concretized calls. We prove that (i) concolic evaluation is sound with respect to concrete evaluation; (ii) modulo concretization and SMT solver incompleteness, the search for a counter-example succeeds if a user program has a bug and (iii) this search amounts to directed evolution of inputs targeting hard-to-reach corners of the program.

# 1 Introduction

Concolic testing [8, 20] allows symbolic evaluation to leverage concrete inputs as it attempts to uncover bugs. The role of concrete inputs is twofold. First, they help symbolic evaluation focus on one control-flow path at a time, thus allowing the exploration of the behavior of a user program in an incremental and directed fashion. Second, they enable concretization, permitting symbolic evaluation to seamlessly switch to concrete evaluation and back, thus facilitating interoperability with external libraries. Testament to the success of concolic testing is adaptations to a gamut of linguistic, platform and application settings [3, 6, 7, 12, 14, 15, 17, 21, 22, 23, 25, 29, 30, 35, 37, 38, 39, 41, 43].

However, concolic testers' generation of inputs hinges on the power of SMT solvers. That is, at the end of a run of a user program, the concolic tester constructs a formula whose solution determines the next input. Alas, SMT solvers largely deal with first-order formulas that cannot capture higher-order properties of inputs. As a result, existing concolic testers struggle with JavaScript, Python or Racket components whose inputs are often higher-order functions and fall back to incomplete approximations [17, 28, 31, 36].

The goal of this paper is to introduce provably correct foundations that lift concolic testing to the world of higher-order functions.

```
call-twice = f. let i = f (equals 2) in
                   let j = f (equals 30) in
                   let k = f (equals 7) in
                   (cond [!(i = 12) 1]
                          [!(j = 5) 2]
                          [!(k = -2) 3]
                          [else error])
                                                       error-trigger =
                                                          g. (cond [(g 2) 12]
                                                                    [(g 30) 5]
                                                                    [else -2])
```
Figure 1: One Argument Call Is Not Enough; Example & Error-Triggering Input

There are three interdependent challenges for the design of a correct higherorder concolic tester. First, a higher-order concolic tester needs to be able to generate sufficiently complex function inputs to explore the behavior of a user program. Even in simple higher-order programs, this set of inputs includes functions with sophisticated structure. The left-hand side of figure 1 displays one such program, call-twice. It consumes a higher-order function **<sup>f</sup>** that when given a predicate on numbers returns a number. It calls **<sup>f</sup>** with three different predicates that return true if their input is 2, 30 and 7 respectively. If the result of any of these calls is different than a specific number, call-twice terminates successfully; otherwise call-twice errors. Hence, only a fine-tuned input can make call-twice error. In particular, it has to be a function that calls its argument at least twice with different numbers and returns the right result in each case, like the counterexample on the right-hand side of the figure.

The second challenge is that a higher-order concolic tester needs to be able to generate structurally complex function inputs in a directed manner. Specifically, to preserve the character of first-order concolic testing, a higher-order concolic tester must start with a default input that evolves, with each run of the user program and the help of an SMT solver, to a new input that aims to exercise a previously unexplored region of the program. Returning to the example from figure 1, a higher-order concolic tester should start from a simple **<sup>f</sup>** such as a constant function and then use hints from the evaluation of the example to add appropriate calls inside **<sup>f</sup>** that call **<sup>f</sup>**'s argument, targeting the last branch of call-twice's cond expression.

```
date< = d1. d2. (or ((date-year d1) < (date-year d2))
                     ((date-month d1) < (date-month d2))
                     ((date-day d1) < (date-day d2)))
main = dates. (let sorted-dates = (sort dates date<) in )
```
### Figure 2: Broken Argument for a Library Function.

The third challenge is that, in a higher-order setting, concretization demands that the concolic tester is ready to concretize any call to a higher-order function. For example the main function in figure 2 takes as input a list of dates, calls sort with the comparison function date< and expects the results to be lexicographically sorted (as there are many reasons why sorting is necessary we leave the details to the imagination of the reader). If sort is a library function whose implementation is inaccessible, then the concolic tester has to concretize the call to sort and disable symbolic evaluation for the extent of that call. Unfortunately, date< does not implement the lexicographical order and discovering this requires the concolic tester to track symbolically the flow of values in and out of date< in order to generate a list of dates that exhibits the bug. In other words, the concolic tester should be able to perform "partial" concretization so that date< interacts with sort in a concrete manner while the the evaluation of date< still produces the symbolic information the tester needs.

Our paper contributes the first formal model for a concolic tester for higher-order functions that meets all three challenges:


The remainder of the paper is organized as follows. Section 2 gives an in depth by-example presentation of our approach to higher-order concolic testing. Section 3 presents our formal model and section 4 establishes its correctness properties. Section 5 describes a proof-of-concept implementation of our model that provides evidence that the model is a reasonable basis for the development of effective higher-order testers. Finally, section 6 places our results in the context of related work and section 7 offers some concluding thoughts.

# 2 Higher-order Concolic Testing by Example

The linguistic setting of our exposition of concolic testing is a small call-by-value dynamically-typed functional language without mutable state. Furthermore, we represent bugs explicitly as the term error and assume that user programs come with type-like input specifications.

# 2.1 First-Order Concolic Execution in a Nutshell

The goal of a concolic tester is to find a value for the inputs to a user program that cause the execution to reach error. To do so, the tester runs the user program in a concolic loop with a different input for each loop iteration. There are two differences between concolic evaluation and concrete evaluation. To explain them, consider the user program in the left-hand column of figure 3, where **X** represents the numeric input.


Figure 3: A First, First-order Concolic Example

The first difference is that, instead of concrete values, concolic evaluation utilizes values of the form ‹**t**›, where **t** is a first-order formula over the input variables that codifies the provenance of the value. Concretely, assume that in the first run of our example program the concolic tester picks the concrete input 0. Instead of just starting the evaluation of the program by replacing **X** with 0, the concolic machine keeps an environment that maps **X** to 0 and runs the program with the concolic value ‹**X**› as the input. The concrete counterpart of a concolic value can be computed from the concrete values in the environment and the (first-order) formula **t** at any point during concolic evaluation.

To kick-off concolic evaluation, the concolic machine evaluates the test expression of the outer cond of the example. Specifically the primitive operation × detects that its input is ‹**X**› and returns ‹**X**×**X**›. Even though the concrete counterparts of both of these concolic values are 0, they bear a different relation to the input **X**. The concolic machine proceeds with the rest of the evaluation of the test expression, yielding ‹**X**×**X** - **X** - 992 = 0›. At this point, the concolic machine uses the concrete counterpart of the concolic value and thus decides to follow the "else" branch of the outer cond. Hence, the first run does not trigger error.

The second characteristic of concolic evaluation are the connections it creates between the inputs and the evaluation of a user program. Specifically, the concolic machine logs the concolic value of the test expressions of cond expressions in the user program in the order they are evaluated; we refer to these entries of the log as path constraints. The middle column of figure 3 shows the log (and the inputs) for the run of our example when **X** is 0. Since only one cond expression is evaluated, the log contains a single path constraint that the concrete counterpart of the concolic value ‹**X**×**X** - **X** - 992 = 0› is false, that is the first branch of the cond was not taken. Intuitively, the path constraint connects the evaluation of a cond expression with the input to the program via concolic values. After the first run, the concolic tester asks the SMT solver for an input where **X**×**X** - **X** - 992 = 0 holds, forcing the branch to go the other way. The SMT solver may respond with 32, leading to the run represented in the right-hand column of figure 3. That run again fails to trigger the error, but has a log showing that the first branch of the outer cond was taken this time because **X**×**X** - **X** - 992 = 0 is true. It also has another constraint that indicates that the first branch of the inner cond was not taken because **X** < 0 is false. At this point, the concolic tester can formulate a new SMT problem that requires both **X**×**X** - **X** - 992 = 0 and **X** < 0 to be true. The problem is satisfiable and the SMT solver replies that the new concrete value for **X** should be -31, which uncovers the error.

### 2.2 From Numbers to Function Inputs

As described so far, concolic testing cannot handle inputs that are not numbers or other data types that SMT solvers understand. The concolic tester relies solely on a solver to generate new inputs and for that it needs to prepare a first-order problem that the solver can solve. Our first insight to surpass this restriction is to split the generation of function inputs into two subproblems:


As with many problems that involve higher-order functions, the first subproblem is the hard one. The solution for the second subproblem falls out of that for the first one, exploiting the natural co- and contravariance of higher-order functions. So, we first focus on first-order function inputs and we return to higher-order inputs in section 2.5.

The left-hand column in figure 4 shows a program whose input **F** is a firstorder function from numbers to numbers. One of the many functions that can trigger error in this example is **x**. 2-**x**. However, a key aspect of our approach is recognizing that we care only about the behavior of the input when given 1 and 2. Since the program calls **F** with only those arguments, other arguments are irrelevant. In general, any program that terminates calls its input a finite number times so the concolic tester can model first-order function inputs as functions that look up values from a table, which we represent with a case expression.

As with non-function inputs, the concolic tester starts with the simplest possible function input: **x**. (case **x**), as shown in the middle column of figure 4.


Figure 4: First-order Input

This function looks up its argument in an empty table and returns always 0. If the concolic machine treated this function as a first-order input, it would record that the first branch of cond was not taken because ‹(**F** 1) × 3 = (**F** 2) + 3› is false. This formula, however, involves function symbols which SMT solvers cannot handle when higher-order functions come into play. Thus the concolic machine does not record the constraint and instead simply reduces all applications of **F** en route to the concolic value of the test expression. Unfortunately, this first function input does not help the concolic tester make progress. Since the input returns the constant 0 for any argument, the concolic value of the test loses any connection to **F** and the concolic tester does not have much leverage to adjust **F**'s behavior and affect the evaluation of the program.

To rectify the situation our concolic tester aims to generate a new input with a shape that gives to the tester increased control over **F**'s behavior. The pivotal idea that enables the input evolution process is that the concolic machine logs so called input constraints. That is, in addition to the path constraints of the user program, it also records the values that the user program provides to **F**, or any other function input. Back to the example, the evaluation records two input constraints: one for argument 1 and one for 2. The middle column of figure 4 shows the new log entries along with the path constraint from the evaluation of the cond expression.

With the input constraint from the log, the concolic tester can construct a second function input as shown in the right-hand column of figure 4. This new function input has a case expression with two clauses: one for when the argument is 1 and one for when it is 2. Furthermore the concolic tester introduces two fresh input variables **Y** and **Z** as the actions of the two clauses. The initial values for these two new inputs are both 0. However, exactly because the results of the function are input variables rather than mere constants, the concolic tester can configure the values for these inputs to trigger the error with the help of an SMT solver. Specifically, the concolic value of the test of the first branch of the cond expression in the example becomes ‹**Y**×3 = **Z**+3›, as shown in the log. This problem has solutions and the SMT solver discovers that **Y**=1 and **Z**=0 are sufficient to "switch" the evaluation of the conditional, which triggers the error.

In sum, to handle first-order function inputs, the concolic tester starts with the simplest possible function, records input constraints that describe the arguments that the function consumes, uses the constraints to generate a new function that, in turn, introduces fresh inputs, and finally employs the SMT solver to fine-tune the values for these inputs.

As a final remark in this section, function inputs are regular functions that behave like a concrete input would behave. For the concolic machine though, the evaluation of their bodies is a source of new information that powers the subsequent iterations of the concolic loop. This is a key observation for concretization in the our setting. A concolic tester concretizes calls to functions when it cannot evaluate their bodies in a concolic manner. This situation arises when the function comes from an external library, such as sort from section 1, and the function's code is not under the control of the concolic machine. In the context of this section, this translates to the situation where the function's body cannot interact with any concolic values nor can its evaluation record path constraints in the log of the machine. A naive solution to the issue is that the concolic machine computes the concrete counterpart of the argument, delegates the call of the function to a concrete machine and then uses the result of the concrete call to proceed. This means, however, that the concolic machine loses any constraints from the evaluation of the body of the argument if the argument is a function itself. Instead, our concolic machine uses a proxy argument for the concrete call that wraps the actual argument. Thus calls to the argument go back to the concolic machine that records all the usual constraints and only concretizes any first-order results the argument produces. We return to our approach to concretization in section 3.3.

### 2.3 Input Interactions

The previous example supplies a constant number to **F**. However, programs can also supply other, first-order inputs to their function inputs, as in the example in the left-hand column of figure 5.


Figure 5: Interacting Inputs

In order to trigger the error in this example, **F** must be able to return different results from its two different calls. However, if the initial concrete value for **X** is 0, the concrete counterparts of the arguments to the calls to **F** are the same for both calls. Thus, if the concolic machine logs only the concrete counterparts of the arguments as part of input constraints, the concolic tester loses the connection between **X** and the values that the user program passes to **F**. Instead, the concolic machine uses the concolic values when logging input constraints. As shown in the log in the middle column of figure 5, the concolic values of the arguments to the two calls to **F** are ‹**X**› and ‹**X**×2›. Thus, the concolic tester can extend the case of **F** with two clauses, one for when the concrete counterpart of the argument of **F** matches that of ‹**X**› and one when it matches that of ‹**X**×2›. The effect of this extension is that any problems the concolic tester sends to the SMT solver contain the additional constraint that **X** and **X**×2 are different. Consequently, in a manner similar to the previous example, the concolic tester eventually uses 1 as the concrete value for **X** and discovers the error. The right-hand column of figure 5 displays this counter-example.

### 2.4 Blind Extensions Are Not Enough

So far we have seen how the concolic tester uses input constraints and concolic values to extend the case expression of a first-order function input. However, the extension may lead the concolic tester to a dead-end. This is a subtle point that, unfortunately, requires a complex example to illustrate. Figure 6 contains the simplest one we know.

This example is complex enough that it deserves a brief walkthrough. To start, note that it has two inputs, **F**, a function from numbers to numbers, and **X**, a number, and that reaching the error requires that the tests of all of the branches of the cond expression of the example fail. In effect, the condition for triggering error is the conjunction of the four formulas that follow the negations in the example. To confirm that this example does have a error-triggering input, take **X** to be -10 and **F** to be **x**. 11 × (**x**+11).

If the concolic tester follows the process described so far in this section, it manages to generate an input that makes the tests of the first three branches of cond to fail. But then, it seems impossible for the concolic tester to extend the input further to make the test of the fourth branch cond succeed. To see how this plays out, the middle column in figure 6 shows the state of the concolic machine after a few iterations of the concolic loop. The concolic tester first runs the example with the default constant zero function as the input, which results in 11 and logs the argument **X** for **F**; the concolic tester then extends the case of **F** with a clause that returns a fresh concolic variable **Y**. It then discovers **Y** must be set to 11 to skip the first branch in the cond expression. For this input, the example produces result 7, failing to also skip the second branch of the cond expression. After another iteration, the concolic tester manages to skip the second branch of cond and generates the input shown in the middle column of figure 6.


Figure 6: A Complex, Subtle Example

Let us analyze the middle section of the figure to understand the concolic tester's state at this point in the process. The input consists of a function **F** that returns ‹**Y**› when it sees the input ‹**X**›, where **Y** is 11 and returns ‹**Z**› when it sees 0, where **Z** is 121. When we feed this input to the program in the lefthand column, we skip the first and second branches of the cond, because **F** has been tuned to get through them. This part of the execution produces the first four entries in the log. Next the concolic machine arrives at the third branch of the cond and the call (**F** (**X**+10)), which produces the fifth entry in the log. The concrete value of the argument is 11, which has no matching clause in the case of **F** so **F** returns 0, and the program terminates with 2, following the fourth branch as recorded in the last entry in the log.

The straightforward next step is to insist that this third call has its own distinct clause in **F**, meaning the concolic engine asks the solver for a solution to the equations !(**X** = 0) and !(0 = **X**+10). An input based on the solution to these equations is shown in the third column of figure 6, and it too deserves a careful look. The log is identical up to the last "call" entry so the program evaluates the same to that point. The next entry in the log (second to last) reveals the concolic machine skips the third branch of the cond and thus proceeds with the evaluation of the test **X** = -10 of the fourth branch. Since the value for the input **X** is 1, the machine follows the branch and the program returns 9.

Clearly, since we want the machine to skip the fourth branch too, the tester should present to the solver the same set of equations that lead to the latest input and assert in addition **X** = -10. Unfortunately, there is no solution to these equations since they already contain !(**X** = -10) because the first and third clauses of **F** are distinct.

While it is usually a good choice for the concolic tester to force the arguments the user program provides to function inputs to be distinct, in some cases, like this one, it is necessary to do otherwise. Indeed, at the very point of this example to be able to reach error, we need to improve the concolic tester's capabilities. More precisely, the concolic tester needs to be able to take a new argument and force it into an existing clause rather than adding a new one. In this example, if the concolic tester forces the argument **X**+10 and the argument 0 to match the same clause, then it can add the equation 0 = **X**+10 to the problem it presents to the SMT solver at the end of the iteration of the concolic loop described in the middle column of figure 6. This extra equation no longer clashes with the necessary equation to skip the fourth branch of the user program (!(**X** = -10)) and with the help of the SMT solver, the tester can adjust the input in the middle column of figure 6 to use -10 as **X** and trigger the error.

To sum up, at the end of each iteration of the concolic loop there are multiple ways a first-order input can evolve. The concolic tester can use the logged input constraints to assert to the SMT solver that the arguments of a call to the input are different from those of some other calls and extend the case expression of the input accordingly (section 2.2 to section 2.3). Or, it can assert to the SMT solver that the arguments of two calls to the input are equal (section 2.4). In either case, the concolic tester asks the SMT solver to determine the values of first-order inputs. We revisit formally the evolution of inputs in section 3.2. As a concluding note, we underline that the concolic tester may have to try any number of the possible ways an input can evolve. The strategy the concolic tester uses to prioritize and search the space of these possibilities is out of the scope of this paper. Herein, we focus instead on what the concolic tester can do at each point in the concolic loop and whether a sequence of its choices is guaranteed to reveal a possible error in a user program.

### 2.5 Higher-order Inputs

Handling higher-order inputs, that is functions that consume and/or return other functions, not just numbers, requires a generalization of the ideas in the previous section. However, the seed of the key insight is already there in the way our concolic tester handles first-order function inputs. Intuitively, the tester treats a first-order function input as a source of new, latent inputs that the concolic tester provides to the user program. As we discuss above this is exactly the rationale for the fresh input variables that appear in the actions of the case expressions of first-order function inputs.

Contravariantly, when an input consumes a function argument, the tester can simply treat the function argument as a source of further, latent arguments that the user program provides. The input can decide how and when to call its function argument in order to obtain these latent arguments. These function calls, in turn, open up new points where the concolic tester supplies additional inputs to the user program.

Figure 7: Co- & Contravariance at Work

Concretely, consider the left-hand program in figure 7. It has one input, **G**, which consumes a function **<sup>f</sup>** on numbers and returns a number. As before, the concolic tester starts out by generating the constant zero function. Of course, this does not uncover the error so, same as for first-order function inputs, the concolic tester turns to the input constraints in its log. However, the log simply shows that the user program provides **G** with two procedures. Therefore the case-expression approach does not apply in a straightforward manner. The concolic tester can change the input **G** to return a fresh input variable **X** as in the middle column of figure 7. Unfortunately, this still does not help trigger the error.

While many programming languages offer a certain notion of physical equality for procedures, our approach is for the concolic tester to generate a function **<sup>G</sup>** that calls its argument **<sup>f</sup>** and then inspects the result **fY** with a case expression as if it was yet another argument to **<sup>G</sup>**. In this case, **<sup>G</sup>** calls **<sup>f</sup>** with a fresh input variable **Y** then binds the result to **fY** which acts as a latent argument that the user program provides to **G**. To account for latent arguments, we generalize input constraints to keep track of variables such as **fY** together with the results of calls to function arguments.

The overall effect is that the concolic tester acquires the vantage point it needs to follow the same process as for first-order function inputs. In particular, the input constraints for **fY** contain the results from calling **<sup>f</sup>** that in turn are tied to input variable **Y** and thus under the control of the concolic tester. Furthermore, just like for first-order functions, they provide guidance for filling in the clauses of the case expression of **G**. Concretely in our example, the input constraints for **fY** record that it is equal to either ‹**Y**+1› or ‹**Y**+2›, which the concolic tester can consider as distinct and, with the help of the SMT solver, generate the **G** on the right-hand side of figure 7 that triggers the error, where **X** and **Z** are fresh input variables mapped to 4 and 5 respectively.

Overall, the concolic tester handles function inputs by decomposing them one layer at a time until it ends up with first-order functions. At each point of decomposition, that is when an input calls one of its arguments, the concolic tester introduces fresh input variables and logs input constraints that connect the fresh input variables and the calls' results. Then it keeps track of these connections with input constraints and uses the constraints to fill in the case expressions in the bodies of higher-order function inputs. Effectively, this approach entails that the concolic tester considers inputs in a so called canonical form only. Informally, canonical inputs nest let-expressions and case-expressions. The precise definition of canonical functions and their evolution are the subject of Section 3 along with the rest of the model for higher-order concolic testing.

# 3 Formalizing Higher-order Concolic Testing

The core of our formal model of higher-order concolic testing is a concolic (abstract) machine that loads and runs user programs, and the input evolution metafunction that generates inputs for the next run.

Figure 8: The Full Input Evolution Cycle

Figure 8 depicts how the concolic machine and the input evolution metafunction work together to form the concolic loop. At the beginning of each iteration of the loop, the load metafunction L consumes the environment that maps each input variable **X** in the user program **e**\_ to a value and prepares the user program for the concolic machine. The concolic machine evaluates the loaded program, **e**, with the help of two registers: the environment of inputs and the log (that is initially empty). If the result of the evaluation is not an error, the final content of log together with the environment determine how the input evolves. Specifically, the *evolve* metafunction uses them to compute a list of pairs that each contains a new environment of inputs and a prediction of the contents of the log of the concolic machine after evaluating the program with . The concolic loop repeats and, with each iteration, explores one more input. When it discovers an error in the user program, the loop terminates and the environment of the error-generating input turns into a concrete counter-example.

Section 3.1 details the concolic machine, section 3.2 formalizes the evolution function and section 3.3 extends the model with concretization.

### 3.1 From User Programs to Concolic Evaluation

**op** ::= ! | + | - | × | < | = | integer? | procedure? **<sup>e</sup>**\_ ::= **n** | error | **x** | **X** | (**x**. **e**\_) | **op <sup>e</sup>**\_ | **op <sup>e</sup>**\_ **e**\_ | **e**\_ **e**\_ | (cond [**e**\_ **e**\_] - [else **<sup>e</sup>**\_]) **<sup>X</sup>**, **Y**, **Z**, **F**, **G**, etc., are concolic variables.

```
CF ::= (x. casex)
casex ::= (case x) | (case x [procedure? e° ]
                                                  [‹t› e° ] -
                                                            )
    e° ::= v° | (let z = f v° in casez)
    v° ::= x | ‹X› | CF
```
Figure 10: Canonical Functions

Figure 9 collects the constructs of the language of user programs, including numbers **n**, error, primitive operators **op e**\_ ., multi-way conditional expressions cond, and uppercase variables **X**, **Y**, **F**, etc., for the inputs of a user program. These inputs are either numbers or, as we discuss briefly in section 2.5, functions in canonical form. The error construct represents actual bugs in user programs; dynamic type errors manifest themselves as stuck terms.

Figure 10 provides the formal definition of canonical functions. The body of a canonical function with argument **x** is a **casex** expression with zero or more clauses. As we mention in section 2, a **casex** that has no clauses is equivalent to the constant 0. Different than the presentation in section 2 and due to the dynamically-typed nature of our model, the very first clause of every non-empty **casex** always checks whether **<sup>x</sup>** is a function. If **<sup>x</sup>** is a function **<sup>f</sup>**, similar to the discussion in section 2.5, the action **e°** of the procedure? clause is typically a let expression that applies **<sup>f</sup>** and inspects the result of the application **<sup>z</sup>** with yet another case expression.<sup>1</sup> If **x** is a number then the **casex** compares **x** with each of the concolic values ‹**t**› and delegates to the corresponding action **e°**. Similar to the examples of section 2, the argument **v°** for **<sup>f</sup>** in a let expression is an input, i.e., a concolic value ‹**X**› where **X** is a fresh concolic variable, or a canonical function. Some goes for the actions **e°** of a non-procedure? clause of a case expression. However, in these positions the model can also use variables in scope in an attempt to identify a counter-example for a user program with fewer concolic loop iterations, which is helpful when proving the metatheoretical properties of the model. In general, despite their restricted shape, canonical functions can simulate any function input that triggers an error in a user program. We return to this point in section 4.

As a final remark on canonical functions, one important difference from the discussion of function inputs in section 2.5 is that, herein, each case expression comes with labels . There are two kinds of labels: labels that uniquely identify a case expression and labels that uniquely identify a clause of a case. As we explain further on, their purpose is to allow the concolic tester to analyze the log of the concolic machine after each iteration of the concolic loop to direct the evolution of a canonical function.

Figure 11 shows the complete definition of the concolic machine. As we mention at the beginning of this section, the machine has three registers: the input environment that maps concolic variables **X** to either numbers or canonical functions; the log of constraints ; and the term **e** the machine evaluates.

<sup>1</sup> We use let **<sup>x</sup>** = **e***1* in **<sup>e</sup>***2* as shorthand for (**x**. **e**2) **e**1.

#### **M** ::= -, , **e**


Figure 11: The Concolic Machine and the Evaluation Language

Evaluation terms **e** are user program terms extended with canonical functions and concolic values ‹**t**›. Recall from section 2 that the latter keep track of the provenance of a value as a symbolic first-order formula **t** that an SMT solver can handle. The concrete counterpart of a concolic value can be computed at any point in the evaluation from **t** and the input environment of the concolic machine with the simple E metafunction.

The log, , of the concolic machine collects two kinds of constraints, **p**. Path constraints are either "R-COND", "FALSE", ‹**t**› or "R-COND", "TRUE", ‹**t**› and are logged by evaluating cond expressions. The first indicates that the test of a branch failed during concolic evaluation; the second that the test succeeded. In either case, the concolic value of the test is ‹**t**› where the symbolic first-order formula **t** codifies the necessary and sufficient condition for the test to succeed.

Input constraints, "R-CASE", , **v**, "HIT": *i* and "R-CASE", , **v**, "MISS"-, are logged by evaluating case expressions in canonical functions. The label associates each input constraint with a case expression in the input environment . A "R-CASE", , **v**, "HIT": *i* constraint indicates that the case expression with label given value **<sup>v</sup>** followed the action of its clause with label *i*. A "R-CASE", , **v**, "MISS"- indicates that the case with label given value **v** followed the implicit in our model "else" clause, whose action is the constant 0. Since the first thing a canonical function does when it interacts with the user program is to inspect the value it receives with case, some of the values **v** in input constraints are exactly the values that the user program provides to function inputs and consequently the concolic tester. Others are the results of calls to functions of the user program that higher-order function inputs perform with their let expressions, which are also values that the user program provides to the concolic tester as we discuss in section 2.5. Hence the input constraints here supersede the simplified input constraints from section 2.

Since concolic evaluation handles concolic rather than concrete values, the L, **e**\_ metafunction prepares a user program **e**\_ accordingly for the concolic machine. It traverses **e**\_ and replaces every integer **n** with ‹**n**›, concolic variables **X** with ‹**X**› if maps **X** to an integer and **F** with the actual function if maps **<sup>F</sup>** to a canonical function. Note that <sup>L</sup> does not introduce any ‹**F**› since (**F**) can be a higher-order function which, in general, SMT solvers have no theory for.

Given a loaded program, the concolic machine operates in accordance with the reduction rules from figure 12. The rules can be divided into four groups. Group Sym implements base-value provenance tracking for primitive operators. For primitive operators that have straightforward SMT formula counterparts, rule [R-Trace1] produces a concolic value whose formula is formed by the operator and the symbolic provenance of the operands. Otherwise, [R-Trace2] discards the provenance information of the operands and simply returns the concolic value ‹**n**› where **n** is the concrete result of the operation.

The next group, Cond, includes the rules for cond expressions. In general, the concolic machine inspects the concrete counterpart of the value of the test expression in the first clause of a cond determine whether to take or skip a branch. When <sup>E</sup>, **t** is non-zero, [R-CondTrue] proceeds with the action expression **<sup>e</sup>***1* of the first clause and logs the path constraint "R-COND", "TRUE", ‹**t**›-. When <sup>E</sup>, **t**- is zero, rule [R-CondFalse] drops the first clause of the cond and appends the path constraint "R-COND", "FALSE", ‹**t**› to the list of path constraints. If cond has no other clauses but the else one, [R-CondElse] replaces the conditional expression with the action expression e of its else clause.

The third group, Case, describe the evaluation of case expressions from canonical functions. When evaluating a case expression, the concolic machine searches the clauses for a match. If the case expression is empty or if the input (**v**) is a concolic value whose concrete counterpart is a number that is different from tests of all clauses, [R-CaseMiss1] and [R-CaseMiss2] (respectively) reduce the case expression to the default action expression ‹0›. They also append the input constraint "R-CASE", , **v**, "MISS" to the log. Otherwise, the last two rules of the group handle successful matches. For cases where the input **v** is a function **x**. **e**, [R-CaseHit1] reduces case to the action expression of its first clause e. For cases where the input **v** is a concolic value ‹**t**›, rule [R-CaseHit2] selects the matching clause with label *i* and reduces case to the corresponding action **e***i*. Both rules log the input constraint "R-CASE", , **v**, "HIT": *i* with the label of the case expression, the input **<sup>v</sup>** and the label *i* of the matching clause.

The last group, Other, completes the definition of the reduction rules. Rule [R-App] is the standard call-by-value β-reduction while rule [R-Error] and [R-Ctxt] close the rules over evaluation contexts.


Figure 12: The Reduction Relation of Concolic Evaluation

Before concluding, it is worth mentioning that if the concolic evaluation of a user program raises error, it is straightforward for the concolic tester to produce a counter-example in the language of user programs. All the necessary information is in the latest input environment of the concolic machine.

### 3.2 Evolution of Higher-order Inputs

If the concolic machine evaluates a user program without raising an error, the metafunction *evolve*, - analyzes the log of the machine and compiles a list of new input environments. Specifically, for each constraint from , *evolve*, -- "switches" its truthfulness and computes all new input environments that are compatible with the switched constraint. Here, a new input environment is compatible with if running the user program with produces a log that has the same prefix as plus the constraint that *evolve* has switched to obtain . Put differently, *evolve* returns all possible evolutions of the current input that direct the concolic tester to explore a new aspect of the behavior of the user program. Theorem 3 from section 4 states this property formally.

 , -- *evolve*, - , -- *evolve*, - ++ [**p**] [M-PREFIX] - = -*1* ++ ["R-COND", "FALSE", **v**-] - = -*1* ++ ["R-COND", "TRUE", **v**-] = *update*, - , -- *evolve*, - [M-TRUE] - = -*1* ++ ["R-COND", "TRUE", **v**-] - = -*1* ++ ["R-COND", "FALSE", **v**-] = *update*, - , -- *evolve*, - [M-FALSE]

Figure 13: Negating Conditional Branches in User Programs

Figure 13 collects the three most basic rules of the definition of *evolve*. The first rule, [M-Prefix], is an administrative one; it allows the removal of an arbitrary suffix from the log so that the rest of the rules can focus on the last entry of the remaining log.

The next two rules, [M-False] and [M-True], form the first-order aspect of *evolve* that we discuss in section 2.1. They fire when the last entry of the log is a path constraint from a branch of a cond expression of the user program. Their purpose is to guide *evolve* to generate an input that forces concolic evaluation to change the outcome of the branch. To do so, the two rules replace the constraint with its "negation" and then, with metafunction update, they present the modified list of constraints as a problem to an SMT solver and use the solution to obtain a new input environment .

Figure 14 presents the higher-order rules and figure 15 contains the auxiliary definitions they need. The higher-order rules switch an input constraint of form "R-CASE", , **v**, \_-. Recall that such constraints result from the evaluation of a case expression with label in the body of a canonical function. Thus an input **F** dom(), (**F**) = **C°**[(case **y**)] - = -1 ++ [-"R-CASE", , (**x**. **e**), "MISS"] -<sup>1</sup> , **e°** actionb, locals**C°**, {**y**}localsp**C°** fresh <sup>1</sup> labels(1, **e°** ) - = -1 ++ [-"R-CASE", , (**x**. **e**), "HIT": 1] = 1[**F C°**[(case **y** [procedure? **e°**]1)]] - , - evolve, - [M-NEWPROC1] **F** dom(), (**F**) = **C°**[(case **y**)] - = -1 ++ [-"R-CASE", , ‹**t**›, "MISS"] -<sup>1</sup> , **e°** actionb, locals**C°**, {**y**}localsp**C°** fresh <sup>1</sup> labels(1, **e°** ) - = -1 ++ [-"R-CASE", , ‹**t**›, "MISS"] = 1[**F C°**[(case **y** [procedure? **e°** ]1)]] - , - evolve, - [M-NEWPROC2] **F** dom(), (**F**) = **C°**[(case **y** [procedure? **e°**1]1 [‹**t**2› **e°**2]<sup>2</sup> )] - = -1 ++ [-"R-CASE", , ‹**t**›, \_] -<sup>1</sup> , **e°** actionb, locals**C°**, localsp**C°** fresh n+1 labels(1, **e°** ) - = -1 ++ [-"R-CASE", , ‹**t**›, "HIT": n+1] 2 = 1[**F C°**[(case **y** [procedure? **e°**1]1 [‹**t**2› **e°**2]<sup>2</sup> [‹**t**› **e°** ]n+1)]] = update2, - - , - evolve, - [M-NEWINT] - = -1 ++ [-"R-CASE", , ‹**t**›, \_] **F** dom(), (**F**) = **C°**[(case **y** [procedure? **e°**1]1 [‹**t**2› **e°**2]<sup>2</sup> )] [**t**<sup>2</sup> , 2, ] = [**t**<sup>p</sup> , p, ] ++ [**t**<sup>i</sup> , i] ++ [**t**s, s, ] - = -1 ++ [-"R-CASE", , ‹**t**›, "HIT": i] = update, - - , - evolve, - [M-CHANGE]

Figure 14: Directed Evolution of Higher-order Inputs

constraint is sufficient for *evolve* to identify the case expression in the input environment it concerns.

Rules [M-NewProc1] and [M-NewProc2] apply when the case expression with label is empty. They modify to extend the case expression with a procedure? clause, the default first clause for recognizing function arguments. Rule [M-NewProc1] handles the situation where **v**, the value case examines, is a function. To create a new clause, [M-NewProc1] calls actionb to compute new actions — we return to this metafunction towards the end of the section. Rule [M-NewProc2] handles the situation where **v** is a first-order concolic value ‹**t**›. It is the same as [M-NewProc1] except that the new list of constraints still ends with "R-CASE", , **v**, "MISS" as ‹**t**› cannot match the new procedure? clause of the case expression.

If the case expression with label is non-empty, the concolic tester can change its evaluation only when **v** is not a function. After all, if **v** is a function, the evaluation of a non-empty case always follows the first clause of the case. As

**C°** is the compatible context of **e°**. Ă t**x**, **<sup>y</sup>**, **<sup>z</sup>**,... <sup>u</sup> stands for any finite subset of non-concolic variables. locals : **C°** Given a compatible context of canonical functions, computes the set of all variables in scope in the hole. *localsp* : **C°** Given a compatible context of canonical functions, compute the set of all variables in scope in the hole that are bound to functions. actionb : - - [-, **e°**, .] - , **v°** actionc, **f** *p* fresh **x** fresh labels(, **v°** ) - , let **x** = **f v°** in (case **x**) actionb, -, *p* [E-HAVOC] - , **v°** actionc, - - , **v°** actionb, -, *p* [E-CONST] actionc : - [-, **v°**, .] fresh **X** dom() -[**X** 0], **X** actionc, - [C-INT1] **<sup>x</sup>** - -, **x** actionc, - [C-BOUND] **X** dom(), (**X**) = **n** -, **X** actionc, - [C-INT2] fresh **x** fresh labels() -, **x** (case **x**) actionc, -[C-PROC]

Figure 15: Computation of New Actions & Local Variables

we discuss in section 2.3 and section 2.4, if **v** is a first-order concolic value ‹**t**›, the tester has two options: either to extend the case expression with a new clause, or to assert that ‹**t**› matches an existing clause. Rules [M-NewInt] and [M-Change] handle these two cases, respectively. There are two subcases for [M-NewInt]: ‹**t**› matches an existing clause but the tester opts to create a dedicated clause for it in the next iteration of the concolic loop, or ‹**t**› does not match any existing clause and the tester extends the case to accommodate it. In either case, rule [M-NewInt] computes the new actions for the additional clause in the same manner as in [M-NewProc1] and the new clause is inserted into the case expression. As a last step, rule [M-NewInt] queries the SMT solver to adjust the values of first-order inputs in the environment, ensuring that all the clauses of the extended case are distinct. Rule [M-Change] corresponds to the discussion in section 2.4 and its goal is to assert that ‹**t**› matches an existing clause *i* of the case expression. Hence *evolve* replaces the last entry of the log with "R-CASE", , ‹**t**›, "HIT": *i*-. Similar to the previous rule, as a last step rule [M-Change] consults the SMT solver to adjust the input environment given the new constraint about ‹**t**›.

As a final remark, metafunction actionb computes the set of actions for the new case clauses that *evolve* introduces. It largely follows the grammar of **e°** discussed in section 3.1. When it introduces a new function or a let-expression as a new action, actionb constructs an empty case for their corresponding body expressions. Moreover, actionb delegates to *locals* and *localsp* to compute the set of variables that new actions can refer to. The metafunction *locals* takes a context **C°** and extracts the set of all local variables visible in the hole. The metafunction *localsp* is similar to *locals* but only extracts variables that are bound to functions.

### 3.3 Adding Concretization

**e**\_ ::= .... | concretize(**e**\_) **e** ::= .... | concretize(**e**) **<sup>n</sup>** = E, **t**- -, , concretize(‹**t**›) -, , ‹**n**› [R-CONCRETIZE]

### Figure 16: Adding Concretization to Concolic Evaluation

Figure 16 shows the extensions for concretization. For simplicity, we identify concrete values with ‹**n**› and consider such terms as feasible to interoperate with external functions. We do not introduce any specific concrete evaluation rules. Instead, we augment the reduction rules of the concolic machine with the [R-Concretize] that reduces the new form, concretize(‹**t**›), to its concrete counterpart with the help of E. Recall that the latter metafunction uses the current input to compute the value of the formula **t** of a concolic value.

```
date<
main-bad
         = d1. d2. (or ((date-year d1) < (date-year d2)) )
         = dates. (let sorted-dates = (sort dates date<) in )
sort/wrap
main-ok
          = lst. cmp. (sort lst (x. y. concretize(cmp x y)))
          = dates. (let sorted-dates = (sort/wrap dates date<) in )
```
### Figure 17: sort With Concretization Wrapper

The astute reader will have noticed that the concretization extension handles only first-order values. In the remainder of the section, by revisiting the example from section 1 in figure 17, we argue informally that in fact this is sufficient, even for functions. In the example, date< is a buggy comparison function and sort is a library function that is polymorphic in its list argument. Since sort is external to the concolic tester, the evaluation of its body is delegated to a concrete machine which does not record constraints nor handles concolic values. This quickly becomes an issue for testing main-bad. To discover the bug, the concolic machine needs to log constraints from the evaluation of date< and main-bad. However, this implies that date< produces concolic values which flow to sort and disrupt the concrete evaluation of its body.

A straightforward non-solution is to fully concretize the list of dates and miss recording the critical path constraints from the evaluation of date<'s body. In contrast, our approach enables both the seamless interoperation of the concolic tester with external libraries and the collection of constraints. The key insight is to create wrappers that strategically concretize concolic values. By assumption, sort is parametric to its input list. Thus sort can consume a list of concolic values as long as the comparison function produces concrete results. This leads to the sort/wrap function that behaves like sort, except that its cmp argument is wrapped in a function that concretizes cmp's return value.

The mechanism for creating correct wrappers for higher-order constructs from user annotations is well-studied [13, 16, 40], thus we do not formalize it. However, we note that our proof-of-concept implementation, discussed in section 5, supports all the necessary features to run the example of this section including lists, external functions, concretization annotations and interoperability between a concrete and a concolic machine.

# 4 Correctness of Higher-order Concolic Testing

This section establishes three facts about our concolic tester that together entail its correctness. First, given an input, if concolic evaluation of a user program triggers an error so does the concrete evaluation of the program (soundness). Second, relative to the completeness of SMT solvers, the concolic tester always manages to produce an input in canonical form that triggers error in the user program, if a counter-example for the program exists (completeness). Third, for each iteration of the concolic loop, the concolic tester produces a new input that explores a specific and selected-in-advance aspect of the behavior of the user program (directness). Here we discuss the necessary bits for the formal statements of the three facts. The complete formal development with all the proofs are at https://github.com/shhyou/chop-esop-supplementary.

Soundness guarantees that the concolic machine respects the semantics of user programs. Thus, the information that the concolic machine logs or its use of concolic values do not affect the evaluation of programs. Specifically, the soundness theorem states that if the concolic evaluation of user program **e**\_ with proper input environment reduces to error, <sup>2</sup> the concrete evaluation of **e**\_ with also reduces to error. Since error represents bugs in the user program, soundness effectively reassures that concolic evaluation does not discover spurious bugs.

For the formal statement of the theorem, we first introduce a few technical devices. For closed user programs, i.e., those without input or other free variables, we define a standard call-by-value reduction semantics with reduction relation . Let <sup>C</sup>, **e**\_ be the metafunction that constructs concrete inputs from the input environment and substitutes them in **<sup>e</sup>**\_. That is, <sup>C</sup> traverses the user program **e**\_, dropping any concretize forms and, for each **X** in **e**\_, if maps **X** to a number, <sup>C</sup> replaces **<sup>X</sup>** with the number. Otherwise if maps **<sup>X</sup>** to a function, <sup>C</sup> compiles the canonical function into an equivalent concrete function and replaces **X** with the result.

<sup>2</sup> An environment is *proper* if (i) it maps all concolic variables occurring free in canonical functions in to numbers, (ii) all labels in are unique and (iii) the concrete counterparts of the tests of the clauses in case expressions are numbers. In this section, we only consider proper environments.

Theorem 1 (Soundness). Let **e**\_ be any user program written in the extended language from section 3.3, i.e. **e**\_ with concretize forms. Let be any input environment closing **e**\_. If -, [], L-, **e**\_- \* -, , error then <sup>C</sup>, **e**\_-\* error.

Completeness captures that if the concrete evaluation of a user program with some input raises error, our concolic tester can find the input through the iterative evolution of initially default inputs. More precisely, Theorem 2 formalizes the iterative evolution process as a sequence of pairs of inputs and logs *<sup>1</sup>* , -*1*-,..., *m*, *m* such that (i) the sequence starts with an input environment that contains numbers and default canonical functions and ends with an input environment that triggers error; (ii) each *<sup>i</sup>* is the log produced by the concolic evaluation of the user program with input environment *i*, and (iii) most importantly, each and every adjacent pairs in the sequence is connected by *evolve*: *i+1* , -- *evolvei*, *<sup>i</sup>* and is equivalent to a prefix of *i+1*. In particular, conclusion (iii) says that using the logs from each iteration, *evolve* predicts the logs for the next iteration.

Theorem 2 (Completeness). For any **e**\_ written in the user language in section 3.1 with concolic variables **X***1*,..., **X***n*, if there exists closed values **v**\_*1*,..., **v**\_*<sup>n</sup>* in the language of user programs such that none of the values contain error and **e**\_{**X***<sup>1</sup>* **v**\_*1*, .} \* error then there exists a sequence of environments and logs *<sup>1</sup>* , -*1*-,..., *m*, *m*such that domp*1*q"t**X***1*,..., **<sup>X</sup>***n*<sup>u</sup> and


There are two points worth unpacking here. First, conclusion 1 assumes an appropriate choice between numbers and default canonical functions in the initial environment *1*. In an implementation, either the user supplies an input specification or the tester employs some sophisticated search strategy over all combinations. Second, since the user program may diverge, in conclusion 2 the concolic machine may need to end the evaluation early. As the maximum number of steps needed is finite, an implementation can overcome this by setting a time limit.

We prove Theorem 2 in two steps. First, we show that if there is an input for which the concrete evaluation of a user program raises error, then there exists an input environment that contains numbers and canonical functions that also causes the concolic machine to triggers an error. Thus this step validates the definition of canonical functions.

Lemma 1 (Representation Completeness). We say that , - is a proper counterexample for a user program **<sup>e</sup>**\_ if (i) closes **<sup>e</sup>**\_, i.e. FV <sup>p</sup>**e**\_q Ă dompq, (ii) -, [], L-, **e**\_- \* -, , error and (iii) does not contain input constraints of the form "R-CASE", , **v**, "MISS"-.

For any user program **e**\_ with inputs **X***1*,..., **X***n*. if there exists closed values **v**\_*1*,..., **v**\_*<sup>n</sup>* such that no value contains error and **e**\_{**X***<sup>1</sup>* **v**\_*1*, .} \* error then there exists a proper counterexample of **e**\_.

In the second step of the proof of Theorem 2, we show that the evolution of inputs during the concolic loop results in an environment input that can trigger an error if such an input exists. As a consequence, the concolic tester only needs to explore inputs it generates with *evolve*.

Lemma 2 (Search Completeness). For any **e**\_ with inputs **X***1*,..., **X***n*, if **e**\_ has a proper counterexample then there exists a sequence of environments and logs satisfying Theorem 2 (1)–(4).

The last fact we establish for our concolic tester is necessary for the proof of Lemma 2, but also has value on its own. It entails that, at each iteration of the concolic loop, the concolic tester aims to explore a specific aspect of the behavior of the user program and indeed produces new inputs that achieve this goal. We call this the concolic property. Formally, Theorem 3 shows that after the concolic machine evaluates a user program with an input environment produced by *evolve*, the machine's log is a prefix of the log *evolve* predicts.

Theorem 3 (Concolic). For any **e**\_ and *1*, if

1. -<sup>1</sup> , [], L-1, **e**\_- \* -<sup>1</sup> , 1 ++ [**p**1], **e**1.


then -<sup>2</sup> ,[],L-2, **e**\_- \* -<sup>2</sup> ,2 ++ [**p**2],**e**2 such that 1 ++ [**p**] is equivalent to 2 ++ [**p**2].

# 5 From the Model to a Proof-of-Concept Implementation

A question about our model is whether it can serve as a guide for an effective higher-order concolic tester. To provide some positive evidence, we have implemented a prototype that closely follows the model. The prototype plays the role of a sanity check that our theoretically-correct model is not inherently impractical; performance was not a serious concern. Notably, the prototype's input generation strategy is naive. To ensure progress, the prototype sets a configurable timeout for each run and avoids duplicating work with a log from trying each input it generates. We leave the details to https://github.com/shhyou/ chop-esop-supplementary and only summarize our experimental results here.

We compiled a benchmark suite from three sources. The primary source is Nguy˜ên et al. [33]'s work, specifically from the jfp branch of https://github. com/philnguyen/soft-contract. These programs ultimately come from other papers; see figure 18. The second source is CutEr [18], the Erlang concolic tester. We collected all of the test cases in CutEr's test suite that use higher-order functions and translated them to our prototype's language. Finally, we contribute three small examples as as part of this work that have proven out of reach for


Figure 18: Benchmark Results

both Nguy˜ên et al. [33]'s tool and CutEr. Overall, the benchmark programs use the Scheme numeric tower, booleans, lists, objects encoded as functions [1], strings, symbols, and higher-order functions.

Out of 118 benchmarks, our prototype fails to discover bugs in 4 of the programs. These programs can be grouped based on two limitations of our prototype. First, our search strategy is naive and as a result two benchmarks time out after an hour. Second, our prototype does not handle Racket's struct declaration and a few other complex syntactic features of Racket that two of Nguy˜ên et al. [33]'s benchmarks use.

# 6 Related Work

Concolic Testing. CutEr [17, 18] is a concolic testing tool for Erlang [4]. Although CutEr generates functions, it does not generate inputs that contain calls in their bodies.<sup>3</sup> Palacios and Vidal [34] offer an instrumentation approach for concolic testers of functional languages but do not address the generation of higher-order inputs.

Li et al. [31] extend the design of path constraints with symbolic subtype expressions to handle polymorphism in object-oriented languages. However, their input generation uses only already defined classes.

Path explosion remains a central challenge for concolic testing techniques [5, 9], and it is a challenge that has lead to approaches that rely on the correct handling of function inputs. Godefroid [19] compute function summaries onthe-fly to tame the combinatorial explosion of the search space of control-flow paths. Similarly, Anand et al. [2] performs symbolic execution compositionally using function summaries. FOCAL [24] breaks programs down into small units to reduce the search space; it tests each units individual and constructs a systemlevel tests by using summaries. In all three cases, the summaries are first-order and do not include higher-order interactions between functions.

<sup>3</sup> Personal communication with Kostis Sagonas.

Higher-order Symbolic Execution. Nguy˜ên et al. [33] and Tobin-Hochstadt and Van Horn [45] propose the idea of refining symbolic unknown values into canonical shapes to generate higher-order counterexamples. We adapt their refinement rules into the grammar of canonical functions in figure 10. Unfortunately, despite opposite claims, their rules are not complete and fail to generate a counter-example for our buggy call-twice from section 1.<sup>4</sup> Our work provably fixes this issue. Moreover, we introduce the notion of input constraints to support the directed search of the higher-order input space.

Random Testing. QuickCheck [11] supports random testing of higher-order functions by using user-provided maps from the input type to integers and from integers to the output type. Koopman and Plasmeijer [28] improves upon QuickCheck by using a predefined datatype to represent the syntax of higherorder functions. LambdaTester [36] focuses on testing and generating higherorder functions that mutate an object's state in order to affect control-flow paths that depend on this state. Klein et al. [26] random-generate higher-order inputs that call their arguments to trigger bugs in stateful programs with opaque types.

# 7 Conclusion

This work offers a theoretical roadmap for generalizing concolic testing to programs with higher-order inputs. The central innovation is that our concolic tester records salient information about the interactions between a user program and its (canonical) inputs. The information induces an SMT problem that describes a new canonical input that exercises a yet unexplored aspect of the user program.

For this paper, we focus on the quintessential higher-order linguistic feature, higher-order functions. That said, much remains to be done to build this theory into a production tool by, for example, using the insights of this paper to support other features such as objects and mutable state. Specifically for state, our model can be easily and soundly extended to imperative user programs. However, completeness and the generation of stateful function inputs requires further study. Finally, another important direction is improving the implementation, notably exploring search optimizations and strategies. Our prototype uses a naive strategy and this hampers its performance. Nevertheless, we view this paper an essential first step towards sophisticated testing strategies for modern programming languages.

Acknowledgments We appreciate Phúc C. Nguy˜ên, Sam Tobin-Hochstadt, David Van Horn, Aggelos Giantsios, Nikolaos Papaspyrou and Konstantinos Sagonas for explaining the details of their work and being an inspiration for ours. We thank Spencer P. Florence, Lukas Lazarek, Wung Jae Lee, Alex Owens, Peter Zhong, and the anonymous reviewers of their thoughtful feedback on earlier versions of this paper. This material is based upon work supported by the National Science Foundation under Grant No. CNS-1823244.

<sup>4</sup> Personal communication with Phúc Nguy˜ên.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Strong-Separation Logic**

Jens Pagel and Florian Zuleger ( )

TU Wien, Vienna, Austria {pagel,zuleger}@forsyte.at

**Abstract.** Most automated verifiers for separation logic are based on the symbolic-heap fragment, which disallows both the magic-wand operator and the application of classical Boolean operators to spatial formulas. This is not surprising, as support for the magic wand quickly leads to undecidability, especially when combined with inductive predicates for reasoning about data structures. To circumvent these undecidability results, we propose assigning a more restrictive semantics to the separating conjunction. We argue that the resulting logic, strong-separation logic, can be used for symbolic execution and abductive reasoning just like "standard" separation logic, while remaining decidable even in the presence of both the magic wand and the list-segment predicate—a combination of features that leads to undecidability for the standard semantics.

# **1 Introduction**

Separation logic [40] is one of the most successful formalisms for the analysis and verification of programs making use of dynamic resources such as heap memory and access permissions [7,30,10,5,17,24,9]. At the heart of the success of separation logic (SL) is the separating conjunction, ∗, which supports concise statements about the disjointness of resources. In this article, we will focus on separation logic for describing the heap in single-threaded heap-manipulating programs. In this setting, the formula ϕ ∗ ψ can be read as "the heap can be split into two disjoint parts, such that ϕ holds for one part and ψ for the other."

Our article starts from the following observation: The standard semantics of ∗ allows splitting a heap into two arbitrary sub-heaps. The magic-wand operator −∗, which is the adjoint of ∗, then allows adding arbitrary heaps. This arbitrary splitting and adding of heaps makes reasoning about SL formulas difficult, and quickly renders separation logic undecidable when inductive predicates for data structures are considered. For example, Demri et al. recently showed that adding only the singly-linked list-segment predicate to propositional separation logic (i.e., with ∗, −∗ and classical connectives ∧, ∨, ¬) leads to undecidability [16].

Most SL specifications used in automated verification do not, however, make use of arbitrary heap compositions. For example, the widely used symbolic-heap fragments of separation logic considered, e.g., in [3,4,13,21,22], have the following property: a symbolic heap satisfies a separating conjunction, if and only if one can split the model at locations that are the values of some program variables.

Motivated by this observation, we propose a more restrictive separating conjunction that allows splitting the heap only at location that are the values of some

<sup>©</sup> The Author(s) 2021

N. Yoshida (Ed.): ESOP 2021, LNCS 12648, pp. 664–692, 2021. https://doi.org/10.1007/978-3-030-72019-3\_24

x y nil = x y + y nil

(a) A model of ls(x, y) ∗ ls(y, nil) in both the standard semantics and our semantics.

x nil = x nil +

(b) A model of ls(x, nil) ∗ t in the standard semantics.

Fig. 1: Two models and their decomposition into disjoint submodels. Dangling arrows represent dangling pointers.

program variables. We call the resulting logic strong-separation logic. Strongseparation logic (SSL) shares many properties with standard separation-logic semantics; for example, the models of our logic form a separation algebra. Because the frame rule and other standard SL inference rules continue to hold for SSL, SSL is suitable for deductive Hoare-style verification `a la [23,40], symbolic execution [4], as well as abductive reasoning [10,9]. At the same time, SSL has much better computational properties than standard SL—especially when formulas contain expressive features such as the magic wand, −∗, or negation.

We now give a more detailed introduction to the contributions of this article.

The standard semantics of the separating conjunction. To be able to justify our changed semantics of ∗, we need to introduce a bit of terminology. As standard in separation logic, we interpret SL formulas over stack–heap pairs. A stack is a mapping of the program variables to memory locations. A heap is a finite partial function between memory locations; if a memory location l is mapped to location l , we say the heap contains a pointer from l to l . A memory location l is allocated if there is a pointer of the heap from l to some location l . We call a location dangling if it is the target of a pointer but not allocated; a pointer is dangling if its target location is dangling.

Dangling pointers arise naturally in compositional specifications, i.e., in formulas that employ the separating conjunction ∗: In the standard semantics of separation logic, a stack–heap pair (s, h) satisfies a formula ϕ∗ψ, if it is possible to split the heap h into two disjoint parts h<sup>1</sup> and h<sup>2</sup> such that (s, h1) satisfies ϕ and (s, h2) satisfies ψ. Here, disjoint means that the allocated locations of h<sup>1</sup> and h<sup>2</sup> are disjoint; however, the targets of the pointers of h<sup>1</sup> and h<sup>2</sup> do not have to be disjoint.

We illustrate this in Fig. 1a, where we show a graphical representation of a stack–heap pair (s, h) that satisfies the formula ls(x, y) ∗ ls(y, nil). Here, ls denotes the list-segment predicate. As shown in Fig. 1a, h can be split into two disjoint parts h<sup>1</sup> and h<sup>2</sup> such that (s, h1) is a model of ls(x, y) and (s, h2) is a model of ls(y, nil). Now, h<sup>1</sup> has a dangling pointer with target s(y) (displayed with an orange background), while no pointer is dangling in the heap h.

In what sense is the standard semantics too permissive? The standard semantics of ∗ allows splitting a heap into two arbitrary sub-heaps, which may result in the introduction of arbitrary dangling pointers into the sub-heaps. We note, however,

that the introduction of dangling pointers is not arbitrary when splitting the models of ls(x, y) ∗ ls(y, nil); there is only one way of splitting the models of this formula, namely at the location of program variable y. The formula ls(x, y)∗ ls(y, nil) belongs to a certain variant of the symbolic-heap fragment of separation logic, and all formulas of this fragment have the property that their models can only be split at locations that are the values of some variables.

Standard SL semantics also allows the introduction of dangling pointers without the use of variables. Fig. 1b shows a model of ls(x, nil) ∗ t—assuming the standard semantics. Here, the formula t (for true) stands for any arbitrary heap. In particular, this includes heaps with arbitrary dangling pointers into the list segment ls(x, nil). This power of introducing arbitrary dangling pointers is what is used by Demri et al. for their undecidability proof of propositional separation logic with the singly-linked list-segment predicate [16].

Strong-separation logic. In this article, we want to explicitly disallow the implicit sharing of dangling locations when composing heaps. We propose to parameterize the separating conjunction by the stack and exclusively allow the union of heaps that only share locations that are pointed to by the stack. For example, the model in Fig. 1b is not a model of ls(x, nil) ∗ t in our semantics because of the dangling pointers in the sub-heap that satisfies t. Strong-separation logic (SSL) is the logic resulting from this restricted definition of the separating conjunction.

Why should I care? We argue that SSL is a promising proposal for automated program verification:

1) We show that the memory models of strong-separation logic form a separation algebra [11], which guarantees the soundness of the standard frame rule of SL [40]. Consequently, SSL can be potentially be used instead of standard SL in a wide variety of (semi-)automated analyzers and verifiers, including Hoare-style verification [40], symbolic execution [4], and bi-abductive shape analysis [10].

2) To date, most automated reasoners for separation logic have been developed for symbolic-heap separation logic [3,4,10,21,22,26,32,27]. In these fragments of separation logic, assertions about the heap can exclusively be combined via ∗; neither the magic wand −∗ nor classical Boolean connectives are permitted. We show that the strong semantics agrees with the standard semantics on symbolic heaps. For this reason, symbolic-heap SL specifications remain unchanged when switching to strong-separation logic.

3) We establish that the satisfiability and entailment problem for full propositional separation logic with the singly-linked list-segment predicate is decidable in our semantics (in PSpace)—in stark contrast to the aforementioned undecidability result obtained by Demri et al. [16] assuming the standard semantics.

4) The standard Hoare-style approach to verification requires discharging verification conditions (VCs), which amounts to proving for loop-free pieces of code that a pre-condition implies some post-condition. Discharging VCs can be automated by calculi that symbolically execute the pre-condition forward resp. the post-condition backward, and then using an entailment checker for proving the implication. For SL, symbolic execution calculi can be formulated using the magic wand resp. the septraction operator. However, these operators have proven to be difficult for automated procedures: "VC-generators do not work especially well with separation logic, as they introduce magic-wand −∗ operators which are difficult to eliminate." [2, p. 131] In contrast, we demonstrate that SSL can overcome the described difficulties. We formulate a forward symbolic execution calculus for a simple heap-manipulating programming language using SSL. In conjunction with our entailment checker, see 3), our calculus gives rise to a fully-automated procedure for discharging VCs of loop-free code segments.

5) Computing solutions to the abduction problem is an integral building block of Facebook's Infer analyzer [9], required for a scalable and fully-automated shape analysis [10]. We show how to compute explicit representations of optimal, i.e., logically weakest and spatially minimal, solutions to the abduction problem for the separation logic considered in this paper. The result is of theoretical interest, as explicit representations for optimal solutions to the abduction problem are hard to obtain [10,19].

Contributions. Our main contributions are as follows:


We strongly believe that these results motivate further research on SSL (e.g., going beyond the singly-linked list-segment predicate, implementing our decision procedure and integrating it into fully-automated analyzers).

Related work. The undecidability of separation logic was established already in [12]. Since then, decision problems for a large number of fragments and variants of separation logic have been studied. Most of this work has been on symbolic-heap separation logic or other variants of the logic that neither support the magic wand nor the use of negation below the ∗ operator. While entailment in the symbolic-heap fragment with inductive definitions is undecidable in general [1], there are decision procedures for variants with built-in lists and/or trees [3,13,34,35,36], support for defining variants of linear structures [20] or tree structures [42,22] or graphs of bounded tree width [21,26]. The expressive heap logics Strand [29] and Dryad [37] also have decidable fragments, as have some other separation logics that allow combining shape and data constraints. Besides the already mentioned work [35,36], these include [28,25].

<sup>1</sup> An extension of this result to a separation logic that also supports trees can be found in the dissertation of the first author [31]

Among the aforementioned works, the graph-based decision procedures of [13] and [25] are most closely related to our approach. Note however, that neither of these works supports reasoning about magic wands or negation below the separating conjunction.

In contrast to symbolic-heap SL, separation logics with the magic wand quickly become undecidable. Propositional separation logic with the magic wand, but without inductive data structures, was shown to be decidable in PSpace in the early days of SL research [12]. Support for this fragment was added to CVC4 a few years ago [39]. Some tools have "lightweight" support for the magic wand involving heuristics and user annotations, in part motivated by the lack of decision procedures [6,41].

There is a significant body of work studying first-order SL with the magic wand and unary points-to assertions, but without a list predicate. This logic was first shown to be undecidable in [8]; a result that has since been refined, showing e.g. that while satisfiability is still in PSpace if we allow one quantified variable [15], two variables already lead to undecidability, even without the separating conjunction [14]. Echenim et al. [18] have recently addressed the satisfiability problem of SL with ∃<sup>∗</sup>∀<sup>∗</sup> quantifier prefix, separating conjunction, magic wand, and full Boolean closure, but no inductive definitions. The logic was shown to be undecidable in general (contradicting an earlier claim [38]), but decidable in PSpace under certain restrictions.

Outline. In Section 2, we introduce two semantics of propositional separation logic, the standard semantics and our new strong-separation semantics. We show the decidability of the satisfiability and entailment problems of SSL with lists in Section 3. We present symbolic execution rules for SSL in Section 4. We show how to compute explicit representations of optimal solutions to the abduction problem in Section 5. We conclude in Section 6. All missing proofs are given in the extended version [33] for space reasons.

# **2 Strong- and Weak-Separation Logic**

### **2.1 Preliminaries**

We denote by |X| the cardinality of the set X. Let f be a (partial) function. Then dom(f) and img(f) denote the domain and image of f, respectively. We write |f| := |dom(f)| and f(x) = ⊥ for x ∈ dom(f). We frequently use set notation to define and reason about partial functions: f := {x<sup>1</sup> → y1,...,x<sup>k</sup> → yk} is the partial function that maps x<sup>i</sup> to yi, 1 ≤ i ≤ k, and is undefined on all other values; <sup>f</sup> <sup>−</sup><sup>1</sup>(b) is the set of all elements <sup>a</sup> with <sup>f</sup>(a) = <sup>b</sup>; we write <sup>f</sup> <sup>∪</sup> <sup>g</sup> resp. f ∩ g for the union resp. intersection of partial functions f and g, provided that f(a) = g(a) for all a ∈ dom(f) ∩ dom(g); similarly, f ⊆ g holds if dom(f) ⊆ dom(g). Sets and ordered sequences are denoted in boldface, e.g., **x**. To list the elements of a sequence, we write x1,...,xk.

We assume a linearly-ordered infinite set of variables **Var** with nil ∈ **Var** and denote by max(**v**) the maximal variable among a set of variables **v** according

$$\begin{array}{l} \tau ::= \mathtt{emp} \mid x \mapsto y \mid \mathtt{Is}(x, y) \mid x = y \mid x \neq y \\ \varphi ::= \tau \mid \varphi \* \varphi \mid \varphi \neg \mathtt{@\varphi} \mid \varphi \land \varphi \mid \varphi \lor \varphi \mid \neg \varphi \end{array}$$

Fig. 2: The syntax of separation logic with list segments.

(s, h) |= **emp** iff dom(h) = ∅ (s, h) |= x = y iff dom(h) = ∅ and s(x) = s(y) (s, h) |= x = y iff dom(h) = ∅ and s(x) = s(y) (s, h) |= x → y iff h = {s(x) → s(y)} (s, h) |= ls(x, y) iff dom(h) = ∅ and s(x) = s(y) or there exist n ≥ 1, 0,...,<sup>n</sup> with h = {<sup>0</sup> → 1,...,<sup>n</sup>−<sup>1</sup> → n} , s(x) = <sup>0</sup> and s(y) = <sup>n</sup> (s, h) |= ϕ<sup>1</sup> ∧ ϕ<sup>2</sup> iff (s, h) |= ϕ<sup>1</sup> and (s, h) |= ϕ<sup>2</sup> (s, h) |= ¬ϕ iff (s, h) |= ϕ (s, h) wk <sup>|</sup><sup>=</sup> <sup>ϕ</sup><sup>1</sup> <sup>∗</sup> <sup>ϕ</sup><sup>2</sup> iff there exist <sup>h</sup>1, h<sup>2</sup> with <sup>h</sup> <sup>=</sup> <sup>h</sup><sup>1</sup> <sup>+</sup> <sup>h</sup>2,(s, h1) wk <sup>|</sup><sup>=</sup> <sup>ϕ</sup>1,(s, h2) wk <sup>|</sup><sup>=</sup> <sup>ϕ</sup><sup>2</sup> (s, h) wk <sup>|</sup><sup>=</sup> <sup>ϕ</sup>1<sup>−</sup><sup>ϕ</sup><sup>2</sup> iff exist <sup>h</sup><sup>1</sup> with (s, h1) wk <sup>|</sup><sup>=</sup> <sup>ϕ</sup>1, h <sup>+</sup> <sup>h</sup><sup>1</sup> <sup>=</sup> <sup>⊥</sup> and (s, h <sup>+</sup> <sup>h</sup>1) wk <sup>|</sup><sup>=</sup> <sup>ϕ</sup><sup>2</sup> (s, h) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup><sup>1</sup> <sup>∗</sup> <sup>ϕ</sup><sup>2</sup> iff there exists <sup>h</sup>1, h<sup>2</sup> with <sup>h</sup> <sup>=</sup> <sup>h</sup><sup>1</sup> '<sup>s</sup> <sup>h</sup>2,(s, h1) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup>1,(s, h2) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup><sup>2</sup> (s, h) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup>1<sup>−</sup><sup>ϕ</sup><sup>2</sup> iff exists <sup>h</sup><sup>1</sup> with (s, h1) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup>1, h '<sup>s</sup> <sup>h</sup><sup>1</sup> <sup>=</sup> <sup>⊥</sup> and (s, h '<sup>s</sup> <sup>h</sup>1) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup><sup>2</sup>

Fig. 3: The standard, "weak" semantics of separation logic, wk <sup>|</sup>=, and the "strong" semantics, st <sup>|</sup>=. We write <sup>|</sup>= when there is no difference between wk <sup>|</sup>= and st <sup>|</sup>=.

to this order. In Fig. 2, we define the syntax of the separation-logic fragment we study in this article. The atomic formulas of our logic are the empty-heap predicate **emp**, points-to assertions x → y, the list-segment predicate ls(x, y), equalities <sup>x</sup> <sup>=</sup> <sup>y</sup> and disequalities <sup>x</sup> <sup>=</sup> <sup>y</sup><sup>2</sup> ; in all these cases, x, y <sup>∈</sup> **Var**. Formulas are closed under the classical Boolean operators ∧, ∨, ¬ as well as under the separating conjunction ∗ and the existential magic wand, also called the septraction, <sup>−</sup> (see e.g. [8]). We collect the set of all SL formulas in **SL**. We also consider derived operators and formulas, in particular the separating implication (or magic wand), −∗, defined by <sup>ϕ</sup>−∗<sup>ψ</sup> := <sup>¬</sup>(ϕ<sup>−</sup><sup>¬</sup>ψ).<sup>3</sup> We also use true, defined as t := **emp** ∨ ¬**emp**. Finally, for Φ = {ϕ1,...,ϕn}, we define ∗ <sup>Φ</sup> := <sup>ϕ</sup><sup>1</sup> <sup>∗</sup> <sup>ϕ</sup><sup>2</sup> ∗···∗ <sup>ϕ</sup><sup>n</sup> if n > 1 and ∗ <sup>Φ</sup> := **emp** if <sup>n</sup> = 0. By fvs(ϕ) we denote the set of (free) variables of ϕ. We define the size of the formula ϕ as |ϕ| = 1 for atomic formulas <sup>ϕ</sup>, <sup>|</sup>ϕ<sup>1</sup> <sup>×</sup> <sup>ϕ</sup>2<sup>|</sup> := <sup>|</sup>ϕ1<sup>|</sup> <sup>+</sup> <sup>|</sup>ϕ2<sup>|</sup> + 1 for × ∈ {∧, <sup>∨</sup>, <sup>∗</sup>, <sup>−</sup>} and |¬ϕ1| := |ϕ1| + 1.

### **2.2 Two Semantics of Separation Logic**

Memory model. **Loc** is an infinite set of heap locations. A stack is a partial function s: **Var** # **Loc**. A heap is a partial function h: **Loc** # **Loc**. A model is a stack–heap pair (s, h) with nil ∈ dom(s) and s(nil) ∈/ dom(h). We let locs(h) :=

<sup>2</sup> As our logic contains negation, <sup>x</sup> <sup>=</sup> <sup>y</sup> can be expressed as <sup>¬</sup>(<sup>x</sup> <sup>=</sup> <sup>y</sup>). However, we treat disequalities as atomic to be able to use them in the positive fragment of our logic, defined later, which precludes the use of negation.

<sup>3</sup> As −∗ can be defined via <sup>−</sup> and <sup>¬</sup> and vice-versa, the expressivity of our logic does not depend on which operator we choose. We have chosen <sup>−</sup> because we can include this operator in the positive fragment considered later on.

dom(h) ∪ img(h). A location is dangling if ∈ img(h) \ dom(h). We write **S** for the set of all stacks and **H** for the set of all heaps.

Two notions of disjoint union of heaps. We write h1+h<sup>2</sup> for the union of disjoint heaps, i.e.,

$$h\_1 + h\_2 := \begin{cases} h\_1 \cup h\_2, & \text{if } \text{dom}(h\_1) \cap \text{dom}(h\_2) = \emptyset \\ \bot, & \text{otherwise}. \end{cases}$$

This standard notion of disjoint union is commonly used to assign semantics to the separating conjunction and magic wand. It requires that h<sup>1</sup> and h<sup>2</sup> are domain-disjoint, but does not impose any restrictions on the images of the heaps. In particular, the dangling pointers of h<sup>1</sup> may alias arbitrarily with the domain and image of h<sup>2</sup> and vice-versa.

Let <sup>s</sup> be a stack. We write <sup>h</sup><sup>1</sup> <sup>4</sup><sup>s</sup> <sup>h</sup><sup>2</sup> for the disjoint union of <sup>h</sup><sup>1</sup> and <sup>h</sup><sup>2</sup> that restricts aliasing of dangling pointers to the locations in stack s. This yields an infinite family of union operators: one for each stack. Formally,

$$h\_1 \uplus^s h\_2 := \begin{cases} h\_1 + h\_2, & \text{if } \mathsf{loccs}(h\_1) \cap \mathsf{loccs}(h\_2) \subseteq \mathsf{img}(s), \\\bot, & \text{otherwise}. \end{cases}$$

Intuitively, <sup>h</sup><sup>1</sup> <sup>4</sup><sup>s</sup> <sup>h</sup><sup>2</sup> is the (disjoint) union of heaps that share only locations that are in the image of the stack <sup>s</sup>. Note that if <sup>h</sup><sup>1</sup> <sup>4</sup><sup>s</sup> <sup>h</sup><sup>2</sup> is defined then <sup>h</sup><sup>1</sup> <sup>+</sup>h<sup>2</sup> is defined, but not vice-versa.

Just like the standard disjoint union +, the operator <sup>4</sup><sup>s</sup> gives rise to a separation algebra, i.e., a cancellative, commutative partial monoid [11]:

**Lemma 1.** Let s be a stack and let u be the empty heap (i.e., dom(u) = ∅). The triple (**H**, <sup>4</sup><sup>s</sup>, u) is a separation algebra.

Weak- and strong-separation logic. Both + and <sup>4</sup><sup>s</sup> can be used to give a semantics to the separating conjunction and septraction. We denote the corresponding model relations wk <sup>|</sup>= and st <sup>|</sup>= and define them in Fig. 3. Where the two semantics agree, we simply write |=.

In both semantics, **emp** only holds for the empty heap, and x = y holds for the empty heap when x and y are interpreted by the same location<sup>4</sup>. Points-to assertions x → y are precise, i.e., only hold in singleton heaps. (It is, of course, possible to express intuitionistic points-to assertions by x → y ∗ t.) The list segment predicate ls(x, y) holds in possibly-empty lists of pointers from s(x) to s(y). The semantics of Boolean connectives are standard. The semantics of the separating conjunction, <sup>∗</sup>, and septraction, <sup>−</sup>, differ based on the choice of + vs. <sup>4</sup><sup>s</sup> for combining disjoint heaps. In the former case, denoted wk <sup>|</sup>=, we get the standard semantics of separation logic (cf. [40]). In the latter case, denoted st <sup>|</sup>=, we get a semantics that imposes stronger requirements on sub-heap composition: Sub-heaps may only overlap at locations that are stored in the stack.

<sup>4</sup> Usually x = y is defined to hold for all heaps, not just the empty heap, when x and y are interpreted by the same location; however, this choice does not change the expressivity of the logic: the formula (x = y) ∗ t expresses the standard semantics. Our choice is needed for the results on the positive fragment considered in Section 2.3

Fig. 4: Two models of (ls(a, nil) ∗ t) ∧ (ls(b, nil) ∗ t) for a stack with domain a, b and a stack with domain a, b, c.

Because the semantics st <sup>|</sup>= imposes stronger constraints, we will refer to the standard semantics wk <sup>|</sup>= as the weak semantics of separation logic and to the semantics st <sup>|</sup>= as the strong semantics of separation logic. Moreover, we use the terms weak-separation logic (WSL) and strong-separation logic (SSL) to distinguish between SL with the semantics wk <sup>|</sup>= and st <sup>|</sup>=.

Example 1. Let ϕ := a = b ∗ (ls(a, nil) ∗ t) ∧ (ls(b, nil) ∗ t). In Fig. 4, we show two models of ϕ. On the left, we assume that a, b are the only program variables, whereas on the right, we assume that there is a third program variable c.

Note that the latter model, where the two lists overlap, is possible in SSL only because the lists come together at the location labeled by c. If we removed variable c from the stack, the model would no longer satisfy ϕ according to the strong semantics, because <sup>4</sup><sup>s</sup> would no longer allow splitting the heap at that location. Conversely, the model would still satisfy ϕ with standard semantics.

This is a feature rather than a bug of SSL: By demanding that the user of SSL specify aliasing explicitly—for example by using the specification ls(a, c) ∗ ls(b, c) ∗ ls(c, nil) ∧ c = nil—we rule out unintended aliasing effects. <

Satisfiability and Semantic Consequence. We define the notions of satisfiability and semantic consequence parameterized by a finite set of variables **x** ⊆ **Var**. For a formula ϕ with fvs(ϕ) ⊆ **x**, we say that ϕ is satisfiable w.r.t. **x** if there is a model (s, h) with dom(s) = **<sup>x</sup>** such that (s, h) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup>. We say that <sup>ϕ</sup> entails <sup>ψ</sup> w.r.t. **<sup>x</sup>**, in signs <sup>ϕ</sup> st <sup>|</sup>=**<sup>x</sup>** <sup>ψ</sup>, if (s, h) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup> then also (s, h) st <sup>|</sup><sup>=</sup> <sup>ψ</sup> for all models (s, h) with dom(s) = **x**.

### **2.3 Correspondence of Strong and Weak Semantics on Positive Formulas**

We call an SL formula ϕ positive if it does not contain ¬. Note that, in particular, this implies that ϕ does not contain the magic wand −∗ or the atom t.

In models of positive formulas, all dangling locations are labeled by variables:

**Lemma 2.** Let <sup>ϕ</sup> be positive and (s, h) wk <sup>|</sup><sup>=</sup> <sup>ϕ</sup>. Then, (img(h)\dom(h)) <sup>⊆</sup> img(s).

As every location shared by heaps h<sup>1</sup> h<sup>2</sup> in h<sup>1</sup> +h<sup>2</sup> is dangling either in h<sup>1</sup> or in <sup>h</sup><sup>2</sup> (or both), the operations + and <sup>4</sup><sup>s</sup> coincide on models of positive formulas:

**Lemma 3.** Let (s, h1) wk <sup>|</sup><sup>=</sup> <sup>ϕ</sup><sup>1</sup> and (s, h2) wk <sup>|</sup><sup>=</sup> <sup>ϕ</sup><sup>2</sup> for positive formulas <sup>ϕ</sup>1, ϕ2. Then <sup>h</sup><sup>1</sup> <sup>+</sup> <sup>h</sup><sup>2</sup> <sup>=</sup> <sup>⊥</sup> iff <sup>h</sup><sup>1</sup> <sup>4</sup><sup>s</sup> <sup>h</sup><sup>2</sup> <sup>=</sup> <sup>⊥</sup>.

Since the semantics coincide on atomic formulas by definition and on ∗ by Lemma 2, we can easily show that they coincide on all positive formulas:

**Lemma 4.** Let <sup>ϕ</sup> be a positive formula and let (s, h) be a model. Then (s, h) wk <sup>|</sup><sup>=</sup> <sup>ϕ</sup> iff (s, h) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup>.

By negating Lemma 4, we have that {(s, h) <sup>|</sup> (s, h) wk <sup>|</sup><sup>=</sup> <sup>ϕ</sup>} <sup>=</sup> F (s, h) <sup>|</sup> (s, h) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup> G implies that ϕ contains negation, either explicitly or in the form of a magic wand or t. In particular, Lemma 4 implies that the two semantics coincide on the popular symbolic-heap fragment of separation logic.<sup>5</sup>

We remark that formula ϕ in Example 1 only employs t but not ¬, −∗. Hence, even if only t would be added to the positive fragment, Lemma 4 would no longer hold. Likewise, Lemma 4 does not hold under intuitionistic semantics: as the intuitionistic semantics of a predicate p is equivalent to p ∗ t under classic semantics, if is sufficient to consider ϕ := a = b ∗ (ls(a, nil) ∧ (ls(b, nil)).

# **3 Deciding the SSL Satisfiability Problem**

The goal of this section is to develop a decision procedure for SSL:

**Theorem 1.** Let ϕ ∈ **SL** and let **x** ⊆ **Var** be a finite set of variables with fvs(ϕ) <sup>⊆</sup> **<sup>x</sup>**. It is decidable in PSpace (in <sup>|</sup>ϕ<sup>|</sup> and <sup>|</sup>**x**|) whether there exists a model (s, h) with dom(s) = **<sup>x</sup>** and (s, h) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup>.

Our approach is based on abstracting stack–heap models by abstract memory states (AMS), which have two key properties, which together imply Theorem 1:


The AMS abstraction is motivated by the following insights.


<sup>5</sup> Strictly speaking, Lemma 4 implies this only for the symbolic-heap fragment of the separation logic studied in this paper, i.e., with the list predicate but no other data structures. The result can, however, be generalized to symbolic heaps with trees (see the dissertation of the first author [31]). Symbolic heaps of bounded treewidth as proposed in [21] are an interesting direction for future work.

We proceed as follows. In Sec 3.1, we make precise the notion of memory chunks. In Sec. 3.2, we define abstract memory states (AMS), an abstraction of models that retains for every chunk precisely the information from point (2) above. We will prove the refinement theorem in 3.3. We will show in Sections 3.4–3.6 that we can compute the AMS of the models of a given formula ϕ, which allows us to decide satisfiability and entailment problems for SSL. Finally, we prove the PSpace-completeness result in Sec. 3.7.

### **3.1 Memory Chunks**

We will abstract a model (s, h) by abstracting every chunk of h, which is a minimal nonempty sub-heap of (s, h) that can be split off of h according to the strong-separation semantics.

**Definition 1 (Sub-heap).** Let (s, h) be a model. We say that h<sup>1</sup> is a sub-heap of <sup>h</sup>, in signs <sup>h</sup><sup>1</sup> <sup>h</sup>, if there is some heap <sup>h</sup><sup>2</sup> such that <sup>h</sup> <sup>=</sup> <sup>h</sup><sup>1</sup> <sup>4</sup><sup>s</sup> <sup>h</sup>2. We collect all sub-heaps in the set subHeaps(s, h). <

The following proposition is an immediate consequence of the above definition:

**Proposition 1.** Let (s, h) be a model. Then, (subHeaps(s, h), , , ", ¬) is a Boolean algebra with greatest element h and smallest element ∅, where


The fact that the sub-models form a Boolean algebra allows us to make the following definition<sup>6</sup>:

**Definition 2 (Chunk).** Let (s, h) be a model. A chunk of (s, h) is an atom of the Boolean algebra (subHeaps(s, h), , , ", ¬). We collect all chunks of (s, h) in the set chunks(s, h). <

Because every element of a Boolean algebra can be uniquely decomposed into atoms, we obtain that every heap can be fully decomposed into its chunks:

**Proposition 2.** Let (s, h) be a model and let chunks(s, h) = {h1,...,hn} be its chunks. Then, <sup>h</sup> <sup>=</sup> <sup>h</sup><sup>1</sup> <sup>4</sup><sup>s</sup> <sup>h</sup><sup>2</sup> <sup>4</sup><sup>s</sup> ···4<sup>s</sup> <sup>h</sup>n.

Example 2. Let s = {x → 1, y → 3, u → 5, z → 3, w → 7, v → 9} and h = {1 → 2, 2 → 3, 3 → 8, 4 → 6, 5 → 6, 6 → 3, 7 → 6, 9 → 9, 10 → 11, 11 → 10}. The model (s, h) is illustrated in Fig. 5. This time, we include the identities of

<sup>6</sup> It is an interesting question for future work to relate the chunks considered in this paper to the atomic building blocks used in SL symbolic executions engines. Likewise, it would be interesting to build a symbolic execution engine based on the chunks resp. on the AMS abstraction proposed in this paper.

Fig. 5: Graphical representation of a model consisting of five chunks (left, see Ex. 2) and its induced AMS (right, see Ex. 5).

the locations in the graphical representation; e.g., 3: y, z represents location 3, s(y) = 3, s(z) = 3. The model consists of five chunks, h<sup>1</sup> := {1 → 2, 2 → 3}, h<sup>2</sup> := {9 → 9}, h<sup>3</sup> := {4 → 6, 5 → 6, 6 → 3, 7 → 6}, h<sup>4</sup> := {3 → 8}, and h<sup>5</sup> := {10 → 11, 11 → 10}. <

We distinguish two types of chunks: those that satisfy SSL atoms and those that don't.

**Definition 3 (Positive and Negative chunk).** Let h<sup>c</sup> ⊆ h be a chunk of (s, h). h<sup>c</sup> is a positive chunk if there exists an atomic formula τ such that (s, hc) st <sup>|</sup><sup>=</sup> <sup>τ</sup> . Otherwise, <sup>h</sup><sup>c</sup> is a negative chunk. We collect the respective chunks in chunks<sup>+</sup>(s, h) and chunks−(s, h).

Example 3. Recall the chunks h<sup>1</sup> through h<sup>5</sup> from Ex. 2. h<sup>1</sup> and h<sup>2</sup> are positive chunks (blue in Fig. 5), h<sup>3</sup> to h<sup>5</sup> are negative chunks (orange). <

Negative chunks fall into three (not mutually-exclusive) categories:

**Garbage.** Chunks with locations that are inaccessible via stack variables.


Example 4 (Negative chunks). The chunk h<sup>3</sup> from Example 2 contains garbage, namely the location 4 that cannot be reached via stack variables, and two overlaid list segments (from 5 to 3 and 7 to 3). The chunk h<sup>4</sup> has an unlabeled dangling pointer. The chunk h<sup>5</sup> contains only garbage.

### **3.2 Abstract Memory States**

In abstract memory states (AMSs), we retain for every chunk enough information to (1) determine which atomic formulas the chunk satisfies, and (2) keep track of which variables are allocated within each chunk.

**Definition 4.** A quadruple A = V, E, ρ, γ is an abstract memory state, if


We call V the nodes, E the edges, ρ the negative-allocation constraint and γ the garbage-chunk count of A. We call the AMS A = V, E, ρ, γ garbage-free if ρ = ∅ and γ = ∅.

We collect the set of all AMSs in **AMS**. The size of A is given by |A| := |V | + γ. Finally, the allocated variables of an AMS are given by **alloc**(A) := dom(E) ∪ ρ. <

Every model induces an AMS, defined in terms of the following auxiliary definitions. The equivalence class of variable x with regard to stack s is [x] s <sup>=</sup> := {y | s(y) = s(x)}; the set of all equivalence classes of a given stack s is cls=(s) := {[x] s <sup>=</sup> | x ∈ dom(s)}. We now define the edges induced by a model (s, h). For every equivalence class [x] s <sup>=</sup> ∈ cls=(s), we set

$$\mathsf{edges}(s,h)([x]\_{=}^{s}) := \begin{cases} \langle [y]\_{=}^{s}, =1 \rangle & \text{there are } y \in \mathsf{dom}(s) \text{ and } h\_{c} \in \mathsf{chunk}^{+}(s,h) \\ & \text{with } (s,h\_{c}) \stackrel{\mathfrak{a}}{=} x \mapsto y \\ \langle [y]\_{=}^{s}, \geq 2 \rangle & \text{there are } y \in \mathsf{dom}(s) \text{ and } h\_{c} \in \mathsf{chunk}^{+}(s,h) \\ & \text{with } (s,h\_{c}) \stackrel{\mathfrak{a}}{=} \mathbf{1} \mathbf{s}(x,y) \wedge \neg x \mapsto y \\ \bot, & \text{otherwise.} \end{cases}$$

Finally, we denote the sets of variables allocated in negative chunks by

$$\mathsf{clalsc}^{\sim}(s,h) := \{ \{ [x]\_{=}^s \mid s(x) \in \mathsf{dom}(h\_c) \} \mid h\_c \in \mathsf{chunk}^-(s,h) \} \mid \{ \emptyset \},$$

where (equivalence classes of) variables that are allocated in the same negative chunk are grouped together in a set.

Now we are ready to define the induced AMS of a model.

**Definition 5.** Let (s, h) be a model. Let V := cls=(s), E := edges(s, h), ρ := alloc−(s, h) and γ := \$ \$chunks−(s, h) \$ \$ <sup>−</sup> \$ \$alloc−(s, h) \$ \$.

Then, we say that ams(s, h) := V, E, ρ, γ is the induced AMS of (s, h). < Example 5. The induced AMS of the model (s, h) from Ex. 2 is illustrated on

the right-hand side of Fig. 5. The blue box depicts the graph (V,E) induced by the positive chunks h1, h2; the negative chunks that allocate variables are abstracted to the set ρ = {{{w} , {u}} , {{y, z}}} (note that the variables w and u are allocated in the chunk h<sup>3</sup> and the aliasing variables y, z are allocated in h4); and the garbage-chunk count is 1, because h<sup>5</sup> is the only negative chunk that does not allocate stack variables. <

<sup>7</sup> The edges of an AMS represent either a single pointer (case "= 1") or a list segment of at least length (case "≥2").

Observe that the induced AMS is indeed an AMS:

**Proposition 3.** Let (s, h) be a model. Then ams(s, h) ∈ **AMS**.

The reverse also holds: Every AMS is the induced AMS of at least one model; in fact, even of a model of linear size.

**Lemma 5 (Realizability of AMS).** Let A = V, E, ρ, γ be an AMS. There exists a model (s, h) with ams(s, h) = A whose size is linear in the size of A.

The following lemma demonstrates that we only need the ρ and γ components in order to be able to deal with negation and/or the magic wand:

**Lemma 6 (Models of Positive Formulas have Garbage-free Abstractions).** Let (s, h) be a model. If (s, h) |= ϕ for a positive formula ϕ, then ams(s, h) is garbage-free.

We abstract SL formulas by the set of AMS of their models:

**Definition 6.** Let <sup>s</sup> be a stack. The **SL** abstraction w.r.t. <sup>s</sup>, <sup>α</sup><sup>s</sup> : **SL** <sup>→</sup> <sup>2</sup>**AMS**, is given by

$$\alpha\_s(\varphi) := \{ \mathsf{ams}(s, h) \mid h \in \mathbf{H}, \text{ and } (s, h) \stackrel{\mathfrak{a}}{\right\models} \varphi \}. \tag{7}$$

Because AMSs do not retain any information about heap locations, just about aliasing, abstractions do not differ for stacks with the same equivalence classes:

**Lemma 7.** Let s, s be stacks with cls=(s) = cls=(s ). Then αs(ϕ) = α<sup>s</sup>- (ϕ) for all formulas ϕ.

### **3.3 The Refinement Theorem for SSL**

The main goal of this section is to show the following refinement theorem:

**Theorem 2 (Refinement Theorem).** Let ϕ be a formula and let (s, h1), (s, h2) be models with ams(s, h1) = ams(s, h2). Then (s, h1) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup> iff (s, h2) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup>.

We will prove this theorem step by step, characterizing the AMS abstraction of all atomic formulas and of the composed models before proving the refinement theorem. In the remainder of this section, we fix some model (s, h).

Abstract Memory States of Atomic Formulas The empty-heap predicate **emp** is only satisfied by the empty heap, i.e., by a heap that consists of zero chunks:

**Lemma 8.** (s, h) |= **emp** iff ams(s, h) = cls=(s), ∅, ∅, 0

**Lemma 9.** 1. (s, h) |= x = y iff ams(s, h) = cls=(s), ∅, ∅, 0 and [x] s <sup>=</sup> = [y] s =. 2. (s, h) |= x = y iff ams(s, h) = cls=(s), ∅, ∅, 0 and [x] s <sup>=</sup> = [y] s =.

Models of points-to assertions consist of a single positive chunk of size 1:

**Lemma 10.** Let E = {[x] s <sup>=</sup> → [y] s <sup>=</sup>, = 1}. (s, h) |= x → y iff ams(s, h) = cls=(s),E, ∅, 0.

Intuitively, the list segment ls(x, y) is satisfied by models (s, h) that consist of zero or more positive chunks, corresponding to a (possibly empty) list from some equivalence class [x] s <sup>=</sup> to [y] s <sup>=</sup> via (zero or more) intermediate equivalence classes [x1] s <sup>=</sup>,..., [xn] s <sup>=</sup>. We will use this intuition to define abstract lists; this notion allows us to characterize the AMSs arising from abstracting lists.

**Definition 7.** Let A = V, E, ρ, γ ∈ **AMS**, s be a stack and x, y ∈ **Var**. We say A is an abstract list w.r.t. x and y, in signs A ∈ **AbstLists**(x, y), iff


**Lemma 11.** (s, h) |= ls(x, y) iff ams(s, h) ∈ **AbstLists**(x, y).

Abstract Memory States of Models composed by the Union Operator Our next goal is to lift the union operator <sup>4</sup><sup>s</sup> to the abstract domain **AMS**. We will define an operator • with the following property:

$$\text{if } h\_1 \uplus^s h\_2 \neq \bot \text{ then } \mathfrak{ams}(s, h\_1 \uplus^s h\_2) = \mathfrak{ams}(s, h\_1) \bullet (s, h\_2).$$

AMS composition is a partial operation defined only on compatible AMS. Compatibility enforces (1) that the AMSs were obtained for equivalent stacks (i.e., for stacks s, s with cls=(s) = cls=(s )), and (2) that there is no double allocation.

**Definition 8 (Compatibility of AMSs).** AMSs A<sup>1</sup> = V1, E1, ρ1, γ1 and A<sup>2</sup> = V2, E2, ρ2, γ2 are compatible iff (1) V<sup>1</sup> = V<sup>2</sup> and (2) **alloc**(A1) ∩ **alloc**(A2) = ∅.

Note that if <sup>h</sup>14<sup>s</sup>h<sup>2</sup> is defined, then ams(s, h1) and ams(s, h2) are compatible. The converse is not true, because ams(s, h1) and ams(s, h2) may be compatible even if dom(h1) ∩ dom(h2) = ∅.

AMS composition is defined in a point-wise manner on compatible AMSs and undefined otherwise.

**Definition 9 (AMS composition).** Let A<sup>i</sup> = Vi, Ei, ρi, γi for i = 1, 2 be two AMS. The composition of A1, A<sup>2</sup> is then given by

$$\mathcal{A}\_1 \bullet \mathcal{A}\_2 := \begin{cases} \langle V\_1, E\_1 \cup E\_2, \rho\_1 \cup \rho\_2, \gamma\_1 + \gamma\_2 \rangle, & \text{if } \mathcal{A}\_1, \mathcal{A}\_2 \text{ } compatible, \\ \bot, & \text{otherwise.} \end{cases}$$

**Lemma 12.** Let <sup>s</sup> be a stack and let <sup>h</sup>1, h<sup>2</sup> be heaps. If <sup>h</sup><sup>1</sup> <sup>4</sup><sup>s</sup> <sup>h</sup><sup>2</sup> <sup>=</sup> <sup>⊥</sup> then ams(s, h1) • ams(s, h2) = ⊥.

We next show that ams(s, h<sup>1</sup> <sup>4</sup><sup>s</sup> <sup>h</sup>2) = ams(s, h1) • ams(s, h2) whenever <sup>h</sup><sup>1</sup> <sup>4</sup><sup>s</sup> h<sup>2</sup> is defined:

**Lemma 13 (Homomorphism of composition).** Let (s, h1),(s, h2) be models with <sup>h</sup><sup>1</sup> <sup>4</sup><sup>s</sup> <sup>h</sup><sup>2</sup> <sup>=</sup> <sup>⊥</sup>. Then, ams(s, h<sup>1</sup> <sup>4</sup><sup>s</sup> <sup>h</sup>2) = ams(s, h1) • ams(s, h2).

To show the refinement theorem, we need one additional property of AMS composition. If an AMS A of a model (s, h) can be decomposed into two smaller AMS A = A<sup>1</sup> •A2, it is also possible to decompose the heap h into smaller heaps h1, h<sup>2</sup> with ams(s, hi) = Ai:

**Lemma 14 (Decomposability of AMS).** Let ams(s, h) = A<sup>1</sup> • A2. There exist <sup>h</sup>1, h<sup>2</sup> with <sup>h</sup> <sup>=</sup> <sup>h</sup><sup>1</sup> <sup>4</sup><sup>s</sup> <sup>h</sup>2, ams(s, h1) = <sup>A</sup><sup>1</sup> and ams(s, h2) = <sup>A</sup>2.

These results suffice to prove the Refinement Theorem stated at the beginning of this section (see the extended version [33] for a proof).

**Corollary 1.** Let (s, h) be a model and <sup>ϕ</sup> be a formula. (s, h) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup> iff ams(s, h) <sup>∈</sup> αs(ϕ).

### **3.4 Recursive Equations for Abstract Memory States**

In this section, we derive recursive equations that reduce the set of AMS αs(ϕ) for arbitrary compound formulas to the set of AMS of the constituent formulas of ϕ. In the next sections, we will show that we can actually evaluate these equations, thus obtaining an algorithm for computing the abstraction of arbitrary formulas.

**Lemma 15.** αs(ϕ<sup>1</sup> ∧ ϕ2) = αs(ϕ1) ∩ αs(ϕ2).

**Lemma 16.** αs(ϕ<sup>1</sup> ∨ ϕ2) = αs(ϕ1) ∪ αs(ϕ2).

**Lemma 17.** αs(¬ϕ1) = {ams(s, h) | h ∈ **H**} \ αs(ϕ1).

The Separating Conjunction In Section 3.3, we defined the composition operation, •, on pairs of AMS. We now lift this operation to sets of AMS **A**1, **A**2:

$$\mathbf{A}\_1 \bullet \mathbf{A}\_2 := \{ \mathcal{A}\_1 \bullet \mathcal{A}\_2 \mid \mathcal{A}\_1 \in \mathbf{A}\_1, \mathcal{A}\_2 \in \mathbf{A}\_2, \mathcal{A}\_1 \bullet \mathcal{A}\_2 \neq \bot \}\ .$$

Lemma 13 implies that α<sup>s</sup> is a homomorphism from formulas and ∗ to sets of AMS and •:

**Lemma 18.** For all ϕ1, ϕ2, αs(ϕ<sup>1</sup> ∗ ϕ2) = αs(ϕ1) • αs(ϕ2).

The septraction operator. We next define an abstract septraction operator −• that relates to • in the same way that <sup>−</sup> relates to <sup>∗</sup>. For two sets of AMS **A**1, **A**<sup>2</sup> we set:

**A**1−•**A**<sup>2</sup> := {A ∈ **AMS** | there exists A<sup>1</sup> ∈ **A**<sup>1</sup> s.t. A•A<sup>1</sup> ∈ **A**2}

Then, <sup>α</sup><sup>s</sup> is a homomorphism from formulas and <sup>−</sup> to sets of AMS and −•:

**Lemma 19.** For all <sup>ϕ</sup>1, ϕ2, <sup>α</sup>s(ϕ1<sup>−</sup><sup>ϕ</sup>2) = <sup>α</sup>s(ϕ1)−•αs(ϕ2).

### **3.5 Refining the Refinement Theorem: Bounding Garbage**

Even though we have now characterized the set αs(ϕ) for every formula ϕ, we do not yet have a way to implement AMS computation: While αs(ϕ) is finite if ϕ is a spatial atom, the set is infinite in general; see the cases αs(¬ϕ) and <sup>α</sup>s(ϕ1<sup>−</sup><sup>ϕ</sup>2). However, we note that for a fixed stack <sup>s</sup> only the garbage-chunk count γ of an AMS V, E, ρ, γ ∈ αs(ϕ) can be of arbitrary size, while the size of the nodes V , the edges E and the negative-allocation constraint ρ is bounded by |s|. Fortunately, to decide the satisfiability of any fixed formula ϕ, it is not necessary to keep track of arbitrarily large garbage-chunk counts.

We introduce the chunk size \*ϕ+ of a formula ϕ, which provides an upper bound on the number of chunks that may be necessary to satisfy and/or falsify the formula; \*ϕ+ is defined as follows:

$$\begin{array}{l} \neg\left[\mathbf{emp}\right] = \left[x \mapsto y\right] = \left[\mathbf{1s}(x,y)\right] = \left[x = y\right] = \left[x \neq y\right] := 1\\ \neg\left[\boldsymbol{\varphi} \* \boldsymbol{\psi}\right] := \left[\boldsymbol{\varphi}\right] + \left[\boldsymbol{\psi}\right] \\ \neg\left[\boldsymbol{\varphi} \circledast \boldsymbol{\psi}\right] := \left[\boldsymbol{\psi}\right] \\ \neg\left[\boldsymbol{\varphi} \land \boldsymbol{\psi}\right] = \left[\boldsymbol{\varphi} \lor \boldsymbol{\psi}\right] := \max(\lceil\boldsymbol{\varphi}\rceil, \lceil\boldsymbol{\psi}\rceil) \\ \neg\left[\lceil\boldsymbol{\varphi}\rceil := \lceil\boldsymbol{\varphi}\rceil\right] := \lceil\boldsymbol{\varphi}\rceil. \end{array}$$

Observe that \*ϕ+≤|ϕ| for all ϕ. Intuitively, \*ϕ+ − 1 is an upper bound on the number of times the operation <sup>4</sup><sup>s</sup> needs to be applied when checking whether (s, h) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup>. For example, let <sup>ψ</sup> := <sup>x</sup> → <sup>y</sup> <sup>∗</sup> ((<sup>b</sup> → <sup>c</sup>)<sup>−</sup>(ls(a, c)). Then \*ψ<sup>+</sup> = 2, and to verify that ψ holds in a model that consists of a pointer from x to y and a list segment from a to b, it suffices to split this model \*ϕ+ − 1 = 1 many times using <sup>4</sup><sup>s</sup> (into the pointer and the list segment).

We generalize the refinement theorem, Theorem 2, to models whose AMS differ in their garbage-chunk count, provided both garbage-chunk counts exceed the chunk size of the formula:

**Theorem 3 (Refined Refinement Theorem).** Let ϕ be a formula with \*ϕ+ = k. Let m ≥ k, n ≥ k and let (s, h1) and (s, h2) be models such that ams(s, h1) = V, E, ρ, m, ams(s, h2) = V, E, ρ, n. Then, (s, h1) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup> iff (s, h2) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup>.

This implies that ϕ is satisfiable over stack s iff ϕ is satisfiable by a heap that contains at most \*ϕ+ garbage chunks:

**Corollary 2.** Let ϕ be an formula with \*ϕ+ = k. Then ϕ is satisfiable over stack s iff there exists a heap h such that (1) ams(s, h)=(V, E, ρ, γ) for some γ ≤ k and (2) (s, h) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup>.

### **3.6 Deciding SSL by AMS Computation**

Due to Cor. 2, we can decide the SSL satisfiability problem by means of a function absts(ϕ) that computes the (finite) intersection of the (possibly infinite) set αs(ϕ) and the (finite) set **AMS**k,s := {V, E, ρ, γ ∈ **AMS** | V = cls=(s) and γ ≤ k} for k = \*ϕ+. We define absts(ϕ) in Fig. 6. For atomic predicates we only need to consider garbage-chunk-count 0, whereas the cases <sup>∗</sup>, <sup>−</sup>, <sup>∧</sup> and <sup>∨</sup> require lifting the bound on the garbage-chunk count from m to n ≥ m.

$$\begin{array}{lcl} \mathsf{abs}\_{s}(\mathsf{exp}) := \{ \langle \mathsf{cls}\_{s}(s), \emptyset, \emptyset, 0 \rangle \} \\ \mathsf{abs}\_{s}(x = y) := \mathsf{if} \ s(x) = s(y) \text{ then } \{ \langle \mathsf{cls}\_{s}(s), \emptyset, \emptyset, 0 \rangle \} \text{ else } \emptyset \\ \mathsf{abs}\_{s}(x \neq y) := \mathsf{if} \ s(x) \neq s(y) \text{ then } \{ \langle \mathsf{cls}\_{s}(s), \emptyset, \emptyset, 0 \rangle \} \text{ else } \emptyset \\ \mathsf{abs}\_{s}(x \neq y) := \{ \langle \ \mathsf{cls}\_{s}(s), \{ [x]^{s}\_{s} \Rightarrow [y]^{s}\_{s} \}, \emptyset \rangle \} \\ \mathsf{abs}\_{s}(\mathsf{1s}(x, y)) := \mathsf{Add} \mathsf{Lides}(x, y) \cap \mathsf{A} \mathsf{M} \mathsf{S}\_{0, s} \\ \mathsf{abs}\_{s}(\varphi\_{1} \, \ast \varphi\_{2}) := \mathsf{Ad} \mathsf{N}\_{[\!\!\!\!Lust\_{\!\!\!Lust\_{\!\!\!\!Lust\_{\!\!\!\!Lust\_{\!\!\!\!Lust\_{\!\!\!\!}}}}(\mathsf{abs}\_{s}(\mathsf{1})) \} \\ \mathsf{abs}\_{s}(\varphi\_{1} \, \ast \varphi\_{2}) := \mathsf{Ad} \mathsf{N}\_{[\!\!\!\!\!Lust\_{\!\!\!Lust\_{\!\!\!\!Lust\_{\!\!\!\!\!}}}(\mathsf{abs}\_{s}(\varphi\_{1})) \, \mathsf{0} \, \mathsf{iff} \mathsf{f}\_{[\!\!\!\!\!Lust\_{\!\!\!\!Lust\_{\!\!\!\!\!Lust\_{\!\!\!\!\!}}}(\mathsf{abs}\_{$$

Fig. 6: Computing the abstract memory states of the models of ϕ with stack s.

**Definition 10.** Let m, n <sup>∈</sup> <sup>N</sup> with <sup>m</sup> <sup>≤</sup> <sup>n</sup> and let <sup>A</sup> <sup>=</sup> V, E, ρ, γ ∈ **AMS**. The bound-lifting of A from m to n is

$$\text{lift}\_{m \nearrow n}(\mathcal{A}) := \begin{cases} \{\mathcal{A}\} & \text{if } \gamma < m \\ \{ \langle V, E, \rho, k \rangle \mid m \le k \le n \} & \text{if } \gamma = m. \end{cases}$$

We generalize bound-lifting to sets of AMS: liftm'<sup>n</sup>(**A**) := - A∈**<sup>A</sup>** liftm'<sup>n</sup>(A). <sup>&</sup>lt;

As a consequence of Theorem 3 bound-lifting is sound for all n ≥ \*ϕ+, i.e.,

lift(ϕ)'<sup>n</sup>(αs(ϕ) ∩ **AMS**(ϕ)) = αs(ϕ) ∩ **AMS**n.

By combining this observation with the lemmas characterizing αs, that is Lemmas 8,9,10, 11,15, 16,17, 18 and 19, we obtain the correctness of absts(ϕ):

**Theorem 4.** Let s be a stack and ϕ be a formula. Then, absts(ϕ) = αs(ϕ) ∩ **AMS**(ϕ),s.

Computability of absts(ϕ). We note that the operators •, −•, ∩, ∪ and \ are all computable as the sets that occur in the definition of absts(ϕ) are all finite. It remains to argue that we can compute the set of AMS for all atomic formulas. This is trivial for **emp**, (dis-)equalities, and points-to assertions. For the listsegment predicate, we note that the set absts(ls(x, y)) = **AbstLists**(x, y) ∩ **AMS**(0),s can be easily computed as there are only finitely many abstract lists w.r.t. the set of nodes V = cls=(s). We obtain the following results:

**Corollary 3.** Let s be a (finite) stack. Then absts(ϕ) is computable for all formulas ϕ.

**Theorem 5.** Let ϕ ∈ **SL** and let **x** ⊆ **Var** be a finite set of variables with fvs(ϕ) ⊆ **x**. It is decidable whether there exists a model (s, h) with dom(s) = **x** and (s, h) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup>.

**Corollary 4.** <sup>ϕ</sup> st <sup>|</sup>=**<sup>x</sup>** <sup>ψ</sup> is decidable for all finite sets of variables **<sup>x</sup>** <sup>⊆</sup> **Var** and ϕ, ψ ∈ **SL**.

qbf to sl(F) := **emp** <sup>∧</sup> pairwise different QBF variables x,y x = y ∧ aux(F)

aux(x) := (x → nil) ∗ t aux(¬x) := ¬aux(x) aux(F ∧ G) := aux(F) ∧ aux(G) aux(F ∨ G) := aux(F) ∨ aux(G) aux(∃x. F) := (<sup>x</sup> → nil <sup>∨</sup> **emp**)<sup>−</sup>aux(F) aux(∀x. F) := (<sup>x</sup> → nil <sup>∨</sup> **emp**)−∗aux(F)

Fig. 7: Translation qbf to sl(F) from closed QBF formula F (in negation normal form) to a formula that is satisfiable iff F is true.

### **3.7 Complexity of the SSL Satisfiability Problem**

It is easy to see that the algorithm absts(ϕ) runs in exponential time. We conclude this section with a proof that SSL satisfiability and entailment are actually PSpace-complete.

PSpace-hardness. An easy reduction from quantified Boolean formulas (QBF) shows that the SSL satisfiability problem is PSpace-hard. The reduction is presented in Fig. 7. We encode positive literals x by (x → nil) ∗ t (the heap contains the pointer x → nil) and negative literals by ¬((x → nil) ∗ t) (the heap does not contain the pointer x → nil). The magic wand is used to simulate universals (i.e., to enforce that we consider both the case x → nil and the case **emp**, setting x both to true and to false). Analogously, septraction is used to simulate existentials. Similar reductions can be found (for standard SL) in [12].

**Lemma 20.** The SSL satisfiability problem is PSpace-hard (even without the ls predicate).

Note that this reduction simultaneously proves the PSpace-hardness of SSL model checking: If F is a QBF formula over variables x1,...,xk, then qbf to sl(F) is satisfiable iff ({x<sup>i</sup> → <sup>l</sup><sup>i</sup> <sup>|</sup> <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>} , <sup>∅</sup>) st <sup>|</sup><sup>=</sup> qbf to sl(F) for some locations <sup>i</sup> with <sup>i</sup> = <sup>j</sup> for i = j.

PSpace-membership. For every stack s and every bound on the garbage-chunk count of the AMS we consider, it is possible to encode every AMS by a string of polynomial length.

**Lemma 21.** Let <sup>k</sup> <sup>∈</sup> <sup>N</sup>, let <sup>s</sup> be a stack and <sup>n</sup> := <sup>k</sup>+|s|. There exists an injective function encode : **AMS**k,s → {0, 1} <sup>∗</sup> such that

$$|\mathsf{encode}(\mathcal{A})| \in \mathcal{O}(n\log(n)) \quad \text{for all } \mathcal{A} \in \mathbf{AMS}\_{k,s}.$$

An enumeration-based implementation of the algorithm in Fig. 6 (that has to keep in memory at most one AMS per subformula at any point in the computation) therefore runs in PSpace:

**Lemma 22.** Let ϕ ∈ **SL** and let **x** ⊆ **Var** be a finite set of variables with fvs(ϕ) <sup>⊆</sup> **<sup>x</sup>**. It is decidable in PSpace (in <sup>|</sup>ϕ<sup>|</sup> and <sup>|</sup>**x**|) whether there exists a model (s, h) with dom(s) = **<sup>x</sup>** and (s, h) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup>.

The PSpace-completeness result, Theorem 1, follows by combining Lemmas 20 and 22.

$$\begin{array}{cc} \{x \mapsto z\} \, x. \mathtt{next} := y \, \{x \mapsto y\} & \{\mathtt{emp}\} \, \mathtt{nulloc}(x) \, \{x \mapsto m\} \\\\ \overline{\{x \mapsto z\} \, \mathtt{free}(x) \, \{\mathtt{emp}\}} & \overline{\{\mathtt{emp}\} \, x := y \, \{x = y\}} \\\\ \overline{\{y \mapsto z\} \, x := y. \mathtt{next} \, \{y \mapsto z \ast x = z\}} & x \text{ different from } y \\\\ \overline{\{\mathtt{emp}\} \, \mathtt{assume}(\varphi) \, \{\varphi\}} & \end{array}$$

Fig. 8: Local proof rules of program statements for forward symbolic execution.

$$\text{Frame rule } \frac{\{P\} \, c \, \{Q\}}{\{A \ast P\} \, c \, \{A[\mathbf{x}'/\mathbf{x}] \ast Q\}} \, \mathbf{x} = \text{modified} \text{Vars}(c), \, \mathbf{x}' \text{ fresh}$$

$$\text{Materialization } \frac{\{P\} \, c \, \{Q\}}{\{P\} \, c \, \{x \mapsto z \ast ((x \mapsto z) \dashv \oplus Q)\}} \, Q \stackrel{\text{at}}{=} \neg ((x \mapsto \text{nil}) \dashv \oplus \text{t}), \, z \text{ fresh}$$

Fig. 9: The frame and the materialization rule for forward symbolic execution.

# **4 Program Verification with Strong-Separation Logic**

Our main practical motivation behind SSL is to obtain a decidable logic that can be used for fully automatically discharging verification conditions in a Hoarestyle verification proof. Discharging VCs can be automated by calculi that symbolically execute pre-conditions forward resp. post-conditions backward, and then invoking an entailment checker. Symbolic execution calculi typically either introduce first-order quantifiers or fresh variables in order to deal with updates to the program variables. We leave the extension of SSL to support for quantifiers for future work and in this paper develop a forward symbolic execution calculus based on fresh variables.

We target the usual Hoare-style setting where a verification engineer annotates the pre- and post-condition of a function and provides loop invariants. We exemplify two annotated functions in Fig. 10; the left function reverses a list and the right function copies a list. In addition to the program variables, our annotations may contain logical variables (also known as ghost variables); for example, the annotations of list reverse only contain program variables, while the annotations of list copy also contain the logical variable u (which is assumed to be equal to x in the pre-condition)<sup>8</sup>.

Forward Symbolic Execution Rules. We state local proof rules for a simple heapmanipulating programming language in Fig. 8. We remark that we do not include a rule for the statement x := x.next for ease of exposition; however, this is w.l.o.g. because x := x.next can be simulated by the statements y := x.next; x := y at the expense of introducing an additional program variable y. Our only non-standard choice is the modelling of the malloc statement: we assume a special program variable m, which is never referenced by any program statement and only used

<sup>8</sup> m is a special program variable introduced for modelling malloc.

in the modelling; the malloc statement updates the value of the variable m to the target of the newly allocated memory cell; this modelling justifies the proof rule for malloc stated in Fig. 8. For a small-step operational semantics of our program statements we refer the reader to the extended version [33]. The rules for the program statements in Fig. 8 are local in the sense that they only deal with a single pointer or the empty heap. The rules in Fig. 9 are the main rules of our forward symbolic execution calculus. The frame rule is essential for lifting the local proof rules to larger heaps. The materialization rule ensures that the frame rule can be applied whenever the pre-condition of a local proof rule can be met. We now give more details. For a sequence of program statements **c** = c<sup>1</sup> ··· c<sup>k</sup> and a pre-condition Pstart the symbolic execution calculus derives triples {Pstart} c<sup>1</sup> ··· c<sup>i</sup> {Qi} for all 1 ≤ i ≤ k. In order to proceed from i to i+ 1, either 1) only the frame rule is applied or 2) the materialization rule is applied first followed by an application of the frame rule. The frame rule can be applied if the formula Q<sup>i</sup> has the shape Q<sup>i</sup> = A ∗ P, where A is suitably chosen and P is the pre-condition of the local proof rule for statement ci. Then, Q<sup>i</sup>+1 is given by Q<sup>i</sup>+1 = A[**x** /**x**] ∗ Q, where **x** = modifiedVars(c), **x** are fresh copies of the variables **x** and Q is the right hand side of the local proof rule for statement ci, i.e., we have {P} c<sup>i</sup> {Q}. Note that the frame rule requires substituting the modified program variables with fresh copies: We set modifiedVars(c) := {x, m} for c = malloc(x), modifiedVars(c) := {x} for c = x := y.next and c = x := y, and modifiedVars(c) := ∅, otherwise. The materialization rule may be applied in order to ensure that Q<sup>i</sup> has the shape Q<sup>i</sup> = A ∗ P. This is not needed in case P = **emp** but may be necessary for P = x → y. We note that Q<sup>i</sup> guarantees that a pointer x is allocated iff Q<sup>i</sup> st <sup>|</sup><sup>=</sup> <sup>¬</sup>((<sup>x</sup> → nil)<sup>−</sup><sup>t</sup>). Under this condition, the rule allows introducing a name z for the target of the pointer x. We remark that while backward-symbolic execution calculi typically employ the magic wand, our forward calculus makes use of the dual septraction operator; we were able to design a general rule that guarantees a predicate of shape Q<sup>i</sup> = A ∗ P without the need of coming up with dedicated rules for, e.g., unfolding list predicates.

Applying the forward symbolic execution calculus for verification. We now explain how the proof rules presented in Fig. 8 and 9 can be used for program verification. Our goal is to verify that the pre-condition P of a loop-free piece of code c (in our case, a sequence of program statements) implies the postcondition Q. For this, we apply the symbolic execution calculus and derive a triple {P} c {Q }. It then remains to verify that the final state of the symbolic execution Q implies the post-condition Q. Here, we face the difficulty that the symbolic execution introduces additional variables: Let us assume that all annotations are over a set of variables **x**, which includes the program variables and the logical variables. Further assume that the symbolic execution {P} c {Q } introduced the fresh variables **y**. With the results of Section 3 we can then verify the entailment <sup>Q</sup> st <sup>|</sup>=**<sup>x</sup>**∪**<sup>y</sup>** <sup>Q</sup>. However, we need to guarantee that all models (s, h) of Q with dom(s) = **x** ∪ **y** are also models when we restrict dom(s) to **x** (note that we can think of the variables **y** as implicitly existentially quantified). In order to deal with this issue, we require annotations to be robust:

```
{ls(x, nil)} % list reverse
    a := nil;
    while(x = nil)
    {ls(x, nil) ∗ ls(a, nil)}
    { b := x.next;
       x.next := a;
       a := x;
       w := b; }
    x := w;
{ls(x, nil)}
                                 {ls(x, nil) ∗ u = x} % list copy
                                     malloc(s);
                                     r := s;
                                     while(x = nil)
                                     {ls(u, x) ∗ ls(x, nil) ∗ ls(r, s) ∗ s → m}
                                     { malloc(t);
                                         % t.data := x.data; not modelled
                                         s.next := t;
                                         s := t;
                                         y := x.next;
                                         x := y; }
                                     s.next := nil;
                                 {ls(u, nil) ∗ ls(r, nil)}
```
Fig. 10: List reverse (left) and list copy (right) annotated pre- and post-condition and loop invariants.

**Definition 11 (Robust Formula).** We call a formula ϕ ∈ **SL** robust, if for all models (s1, h) and (s2, h) with fvs(ϕ) ⊆ dom(s1) and fvs(ϕ) ⊆ dom(s2) and <sup>s</sup>1(x) = <sup>s</sup>2(x) for all <sup>x</sup> <sup>∈</sup> fvs(ϕ), we have that (s1, h) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup> iff (s2, h) st <sup>|</sup><sup>=</sup> <sup>ϕ</sup>.

**Lemma 23.** Let ϕ ∈ **SL** be a positive formula. Then, ϕ is robust.

Lemma 4 states that all formulas from the positive fragment are robust. In particular, the annotations in Fig. 10 are robust. As an example for a nonrobust formula consider ϕ in Example 1. We note that Lemma 4 does not cover all robust formulas, e.g., t is robust. We leave the identification of further robust formulas for future work.

We now state the soundness of our symbolic execution calculus:

**Lemma 24 (Soundness of Forward Symbolic Execution).** Let **c** be a sequence of program statements, let P be a robust formula, let {P} **c** {Q} be the triple obtained from symbolic execution, and let V be the fresh variables introduced during symbolic execution. Then, Q is robust and for all stack-heap pairs (s, h),(s , h ) such that (s, h) st <sup>|</sup><sup>=</sup> <sup>P</sup> and (s , h ) can be obtained from (s, h) by executing **c**, there is a stack s with s ⊆ s, V ⊆ dom(s) and (s, h ) st <sup>|</sup><sup>=</sup> <sup>Q</sup>.

Automation. We note that the presented approach can fully-automatically verify that the pre-condition of a loop-free piece of code guarantees its post-condition: For every program statement, we apply its local proof rule using the frame rule (and in addition the materialization rule in case the existence of a pointer target must be guaranteed). We then discharge the entailment query using our decision procedure from Section 3. We now illustrate this approach on the programs from Fig. 10. For both programs we verify that the loop invariant is inductive (in both cases the loop-invariant P is propagated forward through the loop body; it is then checked that the obtained formula Q again implies the loop invariant P; for verifying the implication we apply our decision procedure from Corollary 4): Example 6. Verifying the loop invariant of list reverse:

{ls(x, nil) ∗ ls(a, nil)} (=: P) assume(x = nil) {ls(x, nil) ∗ ls(a, nil) ∗ x = nil} # materialization {<sup>x</sup> → <sup>z</sup><sup>−</sup>(ls(x, nil) <sup>∗</sup> ls(a, nil) <sup>∗</sup> <sup>x</sup> <sup>=</sup> nil) <sup>∗</sup> <sup>x</sup> → <sup>z</sup>} b := x.next {<sup>x</sup> → <sup>z</sup><sup>−</sup>(ls(x, nil) <sup>∗</sup> ls(a, nil) <sup>∗</sup> <sup>x</sup> <sup>=</sup> nil) <sup>∗</sup> <sup>x</sup> → <sup>z</sup> <sup>∗</sup> <sup>b</sup> <sup>=</sup> <sup>z</sup>} x.next := a {<sup>x</sup> → <sup>z</sup><sup>−</sup>(ls(x, nil) <sup>∗</sup> ls(a, nil) <sup>∗</sup> <sup>x</sup> <sup>=</sup> nil) <sup>∗</sup> <sup>x</sup> → <sup>a</sup> <sup>∗</sup> <sup>b</sup> <sup>=</sup> <sup>z</sup>} a := x {<sup>x</sup> → <sup>z</sup><sup>−</sup>(ls(x, nil) <sup>∗</sup> ls(a , nil) ∗ x = nil) ∗ x → a ∗ b = z ∗ a = x} x := b {x → <sup>z</sup><sup>−</sup>(ls(x , nil) ∗ ls(a , nil) ∗ x = nil) ∗ x → a ∗ b = z ∗ a = x ∗ x = b}(=: Q) {ls(x, nil) ∗ ls(a, nil)} (=: P)

Example 7. Verifying the loop invariant of list copy:

$$\begin{cases} \{\mathbf{1}(s,t) \ast \mathbf{1}(s,\mathbf{n}) \ast \mathbf{1}(s,\mathbf{n}) \ast \mathbf{1}(s,\mathbf{s}) \ast s \mapsto m\} \ (\neg P) \\ \quad \mathsf{assum}(\mathbf{x} \neq \mathbf{n} \mathbf{i}) \\ \{\mathbf{1}(s,u) \ast \mathbf{1}(s,\mathbf{n}) \ast \mathbf{1}(s,\mathbf{r}) \ast \mathbf{1}(s,\mathbf{r}) \ast s \mapsto m \ast x \neq \mathbf{n} \mathbf{i} \} \\ \qquad \mathsf{rank1}(\mathbf{c}(t) \\ \{\mathbf{1}(s,u) \ast \mathbf{1}(s,\mathbf{n}) \ast \mathbf{1}(s,\mathbf{r}) \ast s \mapsto m' \ast x \neq \mathbf{n} \mathsf{i} \ast x \neq \mathbf{n} \mathsf{i} \ast t \mapsto m \} \\ \qquad \mathsf{s.n} \mathsf{x} \mathsf{:} \mathsf{x} \mathsf{:} \mathsf{t} \\ \{\mathbf{1}(s,u) \ast \mathbf{1}(s,\mathbf{n}) \ast \mathbf{1}(s,\mathbf{r}) \ast s \mapsto t \ast x \neq \mathbf{n} \mathsf{i} \ast t \mapsto m \} \\ \qquad \mathsf{s.e} \mathsf{t} \\ \mathsf{1s} (\mathsf{s}(u,x) \ast \mathbf{1s}(s,\mathbf{n}) \ast \mathbf{1s}(r,s') \ast s' \mapsto t \ast x \neq \mathbf{n} \mathsf{i} \ast t \mapsto m \ast s = t \} \\ \qquad \mathsf{x} \mapsto z \mathsf{e}(\mathsf{1s}(u,x) \ast \mathbf{1s}(s,\mathbf{n}) \ast \mathbf{1s}(r,s') \ast s' \mapsto t \ast x \neq \mathbf{n} \mathsf{i} \ast t \mapsto m \ast s = t) \ast s \\ \qquad x \mapsto z \} \\ \qquad \qquad x \mapsto z \mathsf{$$

$$\{x' \neq \mathtt{nil} \* t \mapsto m\*s = t\} \* x' \mapsto z\*y = z\*x = y\}(=:\boldsymbol{Q})$$

$$\{\mathtt{Is}(u,x)\*\mathtt{Is}(x,\mathtt{nil})\*\mathtt{Is}(r,s)\*s \mapsto m\}(=:\boldsymbol{P})$$

While our decision procedure can automatically discharge the entailments in both of the above examples, we give a short direct argument for the benefit of the reader for the entailment check of Example 6 (a direct argument could similarly be worked out for Example 7): We note that Q simplifies to <sup>Q</sup> <sup>=</sup> {<sup>a</sup> → <sup>x</sup><sup>−</sup>(ls(a, nil) <sup>∗</sup> ls(a , nil)) ∗ a → a }. Every model (s, h) of Q must consist of a pointer a → a , a list segment ls(a , nil) and a heap h to which the pointer a → x can be added in order to obtain the list segment ls(a, nil); by looking at the semantics of the list segment predicate we see that h in fact must be the list segment ls(x, nil). Further, the pointer a → a can be composed with the list segment ls(a , nil) in order to obtain ls(a, nil).

# **5 Normal Forms and the Abduction Problem**

In this section, we discuss how every AMS can be expressed by a formula, which in turn makes it possible to compute a normal form for every formula. We then discuss how the normal form transformation has applications to the abduction problem.

Normal Form. We lift the abstraction functions from stacks to sets of variables: Let **x** ⊆ **Var** be a finite set of variables and ϕ ∈ **SL** be a formula with fvs(ϕ) ⊆ **x**. We set α**x**(ϕ) := {α**s**(ϕ) | dom(s) = **x**} and abst**x**(ϕ) := α**x**(ϕ) ∩ **AMS**(ϕ),**<sup>x</sup>**, where **AMS**k,**<sup>x</sup>** := {V, E, ρ, γ ∈ **AMS** | - V = **x** and γ ≤ k}. (We note that α**x**(ϕ) is computable by the same argument as in the proof of Theorem 5.)

**Definition 12 (Normal Form).** Let NF**x**(ϕ) := A∈α**x**(ϕ) AMS2SL(ϕ) (A) the normal form of <sup>ϕ</sup>, where AMS2SL<sup>m</sup>(A) is defined as in Fig. 11. <sup>&</sup>lt;

The definition of AMS2SL<sup>m</sup>(A) represents a straightforward encoding of the AMS A: aliasing encodes the aliasing between the stack variables as implied by V ; graph encodes the points-to assertions and lists of length at least two corresponding to the edges E; negalloc encodes that the negative chunks R ∈ ρ precisely allocate the variables **v** ∈ R; garbage ensures that there are either exactly γ additional non-empty memory chunks that do not allocate any stack variable (if γ<m) or at least γ such chunks (if γ = m); negalloc and garbage use the formula negchunk which precisely encodes the definition of a negative chunk. We have the following result about normal forms:

**Theorem 6.** NF**x**(ϕ) st <sup>|</sup>=**<sup>x</sup>** <sup>ϕ</sup> and <sup>ϕ</sup> st <sup>|</sup>=**<sup>x</sup>** NF**x**(ϕ).

The abduction problem. We consider the following relaxation of the entailment problem: The abduction problem is to replace the question mark in the entailment <sup>ϕ</sup> <sup>∗</sup> [?] st <sup>|</sup>=**<sup>x</sup>** <sup>ψ</sup> by a formula such that the entailment becomes true. This problem

AMS2SL<sup>m</sup>(A) :=aliasing(A) <sup>∗</sup> graph(A) <sup>∗</sup> negalloc(A) <sup>∗</sup> garbage<sup>m</sup>(A) aliasing(A) := ∗ **<sup>v</sup>**∈V,x,y∈**<sup>v</sup>** x = y ∗ ∗ **<sup>v</sup>**,**w**∈V,**v**=**<sup>w</sup>** max(**v**) = max(**w**) graph(A) := ∗ <sup>E</sup>(**v**)= **<sup>v</sup>**-,=1 max(**v**) → max(**v**- ) ∗ ∗ <sup>E</sup>(**v**)= **<sup>v</sup>**-,≥2 ls≥<sup>2</sup>(max(**v**), max(**v**- )) negalloc(A) := ∗ R∈ρ negchunk(A) <sup>∧</sup> <sup>0</sup> **v**∈R alloc(max(**v**)) <sup>∧</sup> <sup>0</sup> **v**∈V \R ¬alloc(max(**v**)) garbage<sup>m</sup>(A) := <sup>1</sup> garbage(A, γ) if γ<m garbage(A, <sup>m</sup>-1) ∗ ¬**emp <sup>v</sup>**∈<sup>V</sup> <sup>¬</sup>alloc(max(**v**)) otherwise garbage(A, k) := <sup>1</sup> **emp** if k = 0 garbage(A, <sup>k</sup>-1) <sup>∗</sup> negchunk(A) <sup>∧</sup> **<sup>v</sup>**∈<sup>V</sup> <sup>¬</sup>alloc(max(**v**)) otherwise negchunk(A) :=¬**emp** ∧ ¬(¬**emp** ∗ ¬**emp**)∧ 0 **v**,**w**∈V,ϕ∈{max(**v**)→max(**w**),ls(max(**v**),max(**w**))} ¬ϕ alloc(x) :=¬((<sup>x</sup> → nil)<sup>−</sup><sup>t</sup>) ls≥<sup>2</sup>(x, y) :=ls(x, y) ∧ ¬(x → y)

Fig. 11: The induced formula AMS2SL<sup>m</sup>(A) of AMS <sup>A</sup> <sup>=</sup> V, E, ρ, γ with <sup>γ</sup> <sup>≤</sup> <sup>m</sup>.

is central for obtaining a scalable program analyzer as discussed in [10] <sup>9</sup>. The abduction problem does in general not have a unique solution. Following [10], we thus consider optimization versions of the abduction problem, looking for logically weakest and spatially minimal solutions:

**Definition 13.** Let ϕ, ψ ∈ **SL** and **x** ⊆ **Var** be a finite set of variables. A formula <sup>ζ</sup> is the weakest solution to the abduction problem <sup>ϕ</sup> <sup>∗</sup> [?] st <sup>|</sup>=**<sup>x</sup>** <sup>ψ</sup> if it holds for all abduction solutions <sup>ζ</sup> that <sup>ζ</sup> st <sup>|</sup>=**<sup>x</sup>** <sup>ζ</sup>. An abduction solution is <sup>ζ</sup> minimal, if there is no abduction solution <sup>ζ</sup> with <sup>ζ</sup> st <sup>|</sup>=**<sup>x</sup>** <sup>ζ</sup> <sup>∗</sup> (¬**emp**).

**Lemma 25.** Let ϕ, ψ be formulas and let **x** ⊆ **Var** be a finite set of variables. Then, 1) the weakest solution to the abduction problem <sup>ϕ</sup> <sup>∗</sup> [?] st <sup>|</sup>=**<sup>x</sup>** <sup>ψ</sup> is given by the formula ϕ−∗ψ, and the 2) weakest minimal solution is given by the formula ϕ−∗ψ ∧ ¬((ϕ−∗ψ) ∗ ¬**emp**).

<sup>9</sup> While the program analyzer proposed in [10] employs bi-abductive reasoning, the biabduction procedure in fact proceeds in two separate abduction and frame-inference steps, where the main technical challenge is the abduction step, as frame inference can be incorporated into entailment checking. We believe that the situation for SSL is similar, i.e., solving abduction is the key to implementing a bi-abductive prover for SSL; hence, our focus on the abduction problem.

We now explain how the normal form has applications to the abduction problem. According to Lemma 25, the best solutions to the abduction problem are given by the formulas ζ := ϕ−∗ψ and ζ := ϕ−∗ψ ∧ ¬((ϕ−∗ψ) ∗ ¬**emp**). While it is great that the existence of these solutions is guaranteed, we a-priori do not have a means to compute an explicit representation of these solutions nor to further analyze their structure. However, the normal form operator allows us to obtain the explicit representations NF**x**(ζ) and NF**x**(ζ ). We believe that using these explicit representations in a program analyzer or studying their properties is an interesting topic for further research. Here, we establish one concrete result on solutions to the abduction problem based on normal forms:

We can compute the weakest resp. the weakest minimal solution to the abduction problem from the positive fragment. Observe that among the sub-formulas of aliasing and graph, only the formula ls≥<sup>2</sup> is negative. To be able to use ls≥<sup>2</sup>(x, y) in a positive formula, we first need to add a new spatial atom ls≥<sup>2</sup>(x, y) to SSL with the following semantics: ls≥<sup>2</sup>(x, y) holds in a model iff the model is a list segment of length at least 2 from x to y. (The whole development in Sections 2 and 3 can be extended by this predicate.) We can then simplify the formula graph(A) in AMS2SL<sup>m</sup>(A) by directly translating edges <sup>E</sup>(**v**) = **v** , ≥2 to the atom ls≥<sup>2</sup>(max(**v**), max(**v** )). Then, V,E,ρ,γ∈α**x**(ζ) with <sup>ρ</sup>=∅,γ=0 AMS2SL(ϕ) (A) for ζ = ϕ−∗ψ or ζ = ϕ−∗ψ ∧ ¬((ϕ−∗ψ) ∗ ¬**emp**) is the weakest resp. the weakest minimal solution to the abduction problem from the positive fragment.

# **6 Conclusion**

We have shown that the satisfiability problem for "strong" separation logic with lists is in the same complexity class as the satisfiability problem for standard "weak" separation logic without any data structures: PSpace-complete. This is in stark contrast to the undecidability result for standard (weak) SL semantics, as shown in [16].

We have demonstrated the potential of SSL for program verification: 1) We have provided symbolic execution rules that, in conjunction with our result on the decidability of entailment, can be used for fully-automatically discharging verification conditions. 2) We have discussed how to compute explicit representations to optimal solutions of the abduction problem. This constitutes the first work that addresses the abduction problem for a separation logic closed under Boolean operators and the magic wand.

We consider our results just the first steps in examining strong-separation logic, motivated by the desire to circumvent the undecidability result of [16]. Future work is concerned with the practical evaluation of our decision procedures, with extending the symbolic execution calculus to a full Hoare logic as well as extending the results of this paper to richer separation logics (SL) such as SL with nested data structures or SL with limited support for arithmetic reasoning.

# **References**


Twenty-Third EACSL Annual Conference on Computer Science Logic (CSL) and the Twenty-Ninth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), CSL-LICS '14, pages 37:1–37:10, New York, NY, USA, 2014. ACM.


Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019, Prague, Czech Republic, April 6-11, 2019, Proceedings, Part II, pages 319–336, 2019.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Author Index

Abdulla, Parosh Aziz 1 Athaiya, Snigdha 30 Atig, Mohamed Faouzi 1

Baillot, Patrick 59 Bardin, Sébastien 148 Bartocci, Ezio 491 Basold, Henning 375 Beillahi, Sidi Mohamed 87 Beringer, Lennart 118 Bobot, François 148 Borgström, Johannes 404 Bouajjani, Ahmed 87 Broman, David 404

Chareton, Christophe 148 Cheney, James 579 Chong, Stephen 207

Das, Ankush 178 DeYoung, Henry 178 Dimoulas, Christos 635

Eades III, Harley 462 Enea, Constantin 87

Farina, Gian Pietro 207 Findler, Robert Bruce 635

Gaboardi, Marco 207, 234 Ghyselen, Alexis 59 Godbole, Adwait 1 Goldstein, Harrison 264

Haslbeck, Maximilian P. L. 292 Hughes, John 264

Ish-Shalom, Oren 320 Itzhaky, Shachar 320

Jaber, Guilhem 348, 548

Katoen, Joost-Pieter 491 Katsumata, Shin-ya 234

Keizer, Alex C. 375 Komondoor, Raghavan 30 Kovács, Laura 491 Krishna, S. 1 Kumar, K. Narayan 30

Lammich, Peter 292 Lampropoulos, Leonidas 264 Lundén, Daniel 404

Mak, Carol 432 Moon, Benjamin 462 Moosbrugger, Marcel 491 Mordido, Andreia 178 Murawski, Andrzej S. 348

Ong, C.-H. Luke 432 Orchard, Dominic 234, 462

Pagel, Jens 664 Paquet, Hugo 432, 519 Pérez, Jorge A. 375 Perrelle, Valentin 148 Pfenning, Frank 178 Pierce, Benjamin C. 264

Riba, Colin 548 Ricciotti, Wilmer 579 Rinetzky, Noam 320

Sato, Tetsuya 234 Shoham, Sharon 320

Vafeiadis, Viktor 1 Vákár, Matthijs 607 Valiron, Benoît 148

Wagner, Dominik 432

You, Shu-Hung 635

Zuleger, Florian 664